Files
xamxam/apps/admin/MIGRATION.md
Théophile Gervreau-Mercier 467aced734 Restructure repository and implement secure search feature
Phase 1: Consolidate shared infrastructure
- Create shared/ directory for common code
- Consolidate Database.php from front-backend and formulaire into unified shared/Database.php
  - Smart path detection for test.db vs posterg.db
  - Secure search with wildcard escaping and input validation
  - Support both singleton and direct instantiation patterns
  - Full CRUD methods for admin functionality
- Move RateLimit.php to shared/ (30 requests/min)
- Update all require paths across apps to use shared/

Phase 2: Reorganize directory structure
- Rename front-backend/ → apps/public/
- Rename formulaire/ → apps/admin/
- Rename db/ → database/
- Update all file paths for new structure
- Create root .gitignore excluding databases, cache, logs

Implement secure search feature
- Add apps/public/search.php with full-text search across theses
- Search filters: query, year, orientation, AP program, keywords
- Security features:
  - SQL injection prevention (prepared statements)
  - Wildcard injection prevention (escape % and _)
  - Input validation (max 200 chars, year range 1900-2100)
  - Rate limiting (30 req/min per IP)
  - Pagination limited to 100 results/page
  - XSS protection (htmlspecialchars on output)

Add comprehensive test suite
- Create apps/public/tests/ with proper structure
  - tests/Integration/SearchTest.php - 12 search scenarios
  - tests/Security/SecurityTest.php - vulnerability testing
  - tests/Unit/RateLimitTest.php - rate limit behavior
- Create database/fixtures/CreateTestDatabase.php
- Add apps/public/run-tests.php test runner
- All tests passing (4/4 suites)

Update deployment configuration
- Rename justfile 'sync' recipe to 'deploy'
- Create deploy group with separate deploy-public and deploy-admin
- Add test-deploy recipe for test database
- Exclude *.db, tests/, cache/, *.md from production deploy
- Deploy shared/ to both public and admin locations

Stats: +4482 insertions, -654 deletions across 72 files
2026-02-02 18:53:58 +01:00

358 lines
10 KiB
Markdown

# Migration from YAML to SQLite
## Overview
The Post-ERG thesis submission form has been completely overhauled to use a SQLite database instead of flat YAML files. This provides better data integrity, querying capabilities, and prepares the system for a full-featured web application.
## What Changed
### Database Implementation
**Before:** Form data was saved as individual YAML files in `data/yaml/`, with file uploads scattered in `data/content/` and `data/cover/`.
**After:** All thesis data is now stored in a relational SQLite database (`../db/posterg.db`) with proper normalization and foreign key relationships.
### New Architecture
```
Form Submission Flow:
1. User fills out enhanced form (index.php)
2. Form validates input and begins database transaction
3. Creates/links: author, thesis, supervisors, keywords, languages, formats
4. Uploads files with random names for security
5. Records file metadata in database
6. Commits transaction (all-or-nothing)
7. Redirects to confirmation page showing database data
```
### Database Schema Highlights
- **19 tables** including junction tables and views
- **Normalized structure** (3rd Normal Form)
- **Automatic timestamps** via triggers
- **Cascade deletes** for referential integrity
- **Predefined lookup tables** for orientations, AP programs, finalities, etc.
- **Views** for simplified querying (v_theses_full, v_theses_public)
## New Files
### `Database.php`
Database helper class providing:
- PDO connection with error handling
- Transaction management
- Find-or-create methods for entities
- Prepared statement helpers
- Lookup methods for all reference data
**Key Methods:**
```php
$db = new Database();
$authorId = $db->findOrCreateAuthor($name, $email);
$keywordId = $db->findOrCreateKeyword($keyword);
$orientations = $db->getAllOrientations();
$thesis = $db->getThesis($id);
```
## Modified Files
### `index.php`
**Enhancements:**
- Dynamically loads form options from database
- Added required fields per schema:
- Subtitle (optional)
- Synopsis (~200 words, required)
- Finality (Approfondi/Enseignement/Spécialisé)
- Languages (multiple selection with checkboxes)
- Formats (multiple selection with checkboxes)
- Better form organization with sections
- Improved accessibility (proper labels, IDs)
**New Form Fields:**
| Field | Type | Required | Notes |
|-------|------|----------|-------|
| Subtitle | Text | No | New field |
| Synopsis | Textarea | Yes | ~200 words |
| Finality | Select | Yes | From finality_types table |
| Languages | Checkboxes | Yes | Multiple selection |
| Formats | Checkboxes | No | Multiple selection |
### `formulaire.php`
**Complete rewrite** with:
1. **Transaction-Based Processing:**
- `BEGIN TRANSACTION` at start
- All insertions in single transaction
- `COMMIT` on success or `ROLLBACK` on error
- Ensures data consistency
2. **Prepared Statements:**
- All SQL queries use PDO prepared statements
- Protection against SQL injection
- Parameter binding for all user input
3. **Entity Creation:**
- Finds or creates authors (by name)
- Finds or creates supervisors (by name)
- Finds or creates keywords (by text)
- Links all entities via junction tables
4. **Identifier Generation:**
- Format: `YYYY-NNN` (e.g., "2026-001")
- Automatically increments per year
- Unique constraint in database
5. **File Handling:**
- Random cryptographic filenames (32 hex chars)
- Organized by year and identifier: `data/theses/YYYY/YYYY-NNN/`
- Cover images separate: `data/covers/`
- Metadata stored in `thesis_files` table
6. **Validation:**
- Year range: 2000 to current year + 1
- Max 10 keywords enforced
- At least one language required
- URL format validation
- File type and size validation
### `thanks.php`
**Complete redesign:**
- Reads from database using thesis ID
- Displays data from `v_theses_full` view
- Shows all relationships: authors, supervisors, keywords, languages, formats
- Lists uploaded files with metadata (type, size, date)
- Responsive CSS grid layout
- Publication status indicator
**Security:**
- Validates thesis ID (integer only)
- Uses prepared statements
- No path traversal vulnerability
- Error messages don't expose system details
## Database Files
### `../db/posterg.db`
Initialized SQLite database with:
- 19 tables (11 core, 5 junction, 3 reference)
- 2 views (v_theses_full, v_theses_public)
- Predefined data:
- 15 orientations
- 4 AP programs
- 3 finality types
- 2 languages (French, English)
- 7 format types
- 3 access types
- 4 static pages
### Schema Documentation
See `../db/README.md` and `../db/SETUP.md` for complete documentation.
## Security Improvements Retained
All security improvements from the previous commit are preserved:
✅ CSRF protection with session tokens
✅ Input validation and sanitization
✅ Prepared statements (SQL injection protection)
✅ Random filenames for uploads
✅ File type and size validation
✅ MIME type checking
✅ Error logging without exposing paths
✅ Path traversal protection
## Data Mapping
### YAML to Database Mapping
| Old YAML Field | New Database Location | Notes |
|----------------|----------------------|-------|
| `auteurice` | `authors.name` | Normalized, reusable |
| `email` | `authors.email` | Now in authors table |
| `année` | `theses.year` | Integer field |
| `titre` | `theses.title` | Required |
| - | `theses.subtitle` | New field |
| `description` | `theses.synopsis` | Renamed for clarity |
| `problématique` | (not yet used) | Can be added to schema |
| `orientation` | `theses.orientation_id` | Foreign key to orientations |
| `ap` | `theses.ap_program_id` | Foreign key to ap_programs |
| - | `theses.finality_id` | New field (required) |
| `promoteurice` | `supervisors.name` + `thesis_supervisors` | Many-to-many |
| `tag` | `keywords.keyword` + `thesis_keywords` | Many-to-many, max 10 |
| `lien` | `theses.baiu_link` | URL validation |
| `files` | `thesis_files` table | Full metadata |
| `couverture` | (stored as file, not in DB yet) | Could add cover_path column |
## Migration Path for Existing Data
If you have existing YAML files to import:
1. **Parse YAML files:**
```php
$yamlFiles = glob('data/yaml/*.yaml');
foreach ($yamlFiles as $file) {
$data = Yaml::parseFile($file);
// ...
}
```
2. **Insert into database:**
```php
$db->beginTransaction();
try {
$authorId = $db->findOrCreateAuthor($data['auteurice'], $data['email']);
// Insert thesis
// Link relationships
$db->commit();
} catch (Exception $e) {
$db->rollback();
}
```
3. **Verify data:**
```sql
SELECT COUNT(*) FROM theses;
SELECT * FROM v_theses_full LIMIT 5;
```
## Testing Checklist
Before production deployment:
- [ ] Form loads without errors
- [ ] All dropdown options populate from database
- [ ] Form submission creates thesis record
- [ ] Author is created or found correctly
- [ ] Supervisors linked properly
- [ ] Keywords created and linked (test max 10)
- [ ] Languages required (test validation)
- [ ] Formats optional (test multiple selection)
- [ ] Files upload successfully
- [ ] File metadata recorded in database
- [ ] Thanks page displays all data correctly
- [ ] Transaction rollback works on error
- [ ] CSRF token validated
- [ ] Invalid data rejected (year, URL, etc.)
## Known Limitations
1. **No cover_path column:** Cover images uploaded but path not stored in `theses` table (can be added)
2. **No problématique field:** Old field not yet in schema (can be added to `theses.remarks` or new column)
3. **File type detection:** Basic (by extension), could be enhanced
4. **No duplicate detection:** Same thesis can be submitted multiple times
5. **No edit capability:** Once submitted, no UI to edit (admin interface needed)
## Next Steps
1. **Initialize production database:**
```bash
cd /path/to/production/db
sqlite3 posterg.db < schema.sql
```
2. **Set permissions:**
```bash
chmod 644 posterg.db
chown www-data:www-data posterg.db
```
3. **Test form submission:**
- Submit test thesis
- Verify all fields saved
- Check file uploads
- Test thanks page
4. **Import existing data:**
- Create migration script
- Parse old YAML files
- Bulk insert into database
- Verify integrity
5. **Build admin interface:**
- CRUD operations for theses
- User management
- Approval workflow
- Bulk operations
6. **Build public website:**
- Search and filter theses
- Respect access controls
- Display thesis details
- Static pages management
## Compatibility Notes
### PHP Requirements
- PHP 7.4+ (tested on PHP 8.x)
- PDO extension with SQLite support
- Composer for Symfony YAML (still used for potential migration)
### Database
- SQLite 3.8.0+
- File-based database (no server needed)
- Single file: `db/posterg.db`
### Dependencies
```json
{
"require": {
"symfony/yaml": "^6.2",
"behat/transliterator": "^1.5"
}
}
```
Note: YAML library retained for potential data migration from old files.
## Backup Strategy
SQLite database is a single file - easy to backup:
```bash
# Simple copy
cp db/posterg.db db/backups/posterg_$(date +%Y%m%d).db
# SQL dump (portable)
sqlite3 db/posterg.db .dump > backups/posterg_$(date +%Y%m%d).sql
# Compressed backup
tar -czf backups/posterg_$(date +%Y%m%d).tar.gz db/posterg.db data/
```
Set up automated daily backups via cron.
## Performance Considerations
- **Indexes:** All critical foreign keys and search fields indexed
- **Views:** Pre-computed joins for common queries
- **Transactions:** Ensure atomicity without locking issues
- **File I/O:** Random filenames prevent directory listing overhead
For large datasets (1000+ theses):
- Consider WAL mode: `PRAGMA journal_mode=WAL;`
- Optimize with `ANALYZE;` periodically
- Monitor database size and `VACUUM` if needed
## Rollback Plan
If issues arise, you can roll back to YAML-based system:
1. Use previous jj commit: `jj checkout <commit-id>`
2. Old YAML files in `data/yaml/` still intact
3. Database changes don't affect old YAML code
4. Can run both systems in parallel during transition
## Support
For questions or issues:
- Schema documentation: `db/README.md`
- Setup guide: `db/SETUP.md`
- Security details: `SECURITY.md`
- Technical specs: `db/posterg_fiche-technique.md`
---
**Migration completed:** 2026-01-27
**Database version:** 1.0
**Form version:** 2.0 (SQLite)