Nginx config, working deploy, basic theme, repo cleanup

This commit is contained in:
Théophile Gervreau-Mercier
2026-02-05 17:33:10 +01:00
parent 2cb5436647
commit f23fbb481b
30 changed files with 4536 additions and 760 deletions

View File

@@ -1,153 +0,0 @@
# CSV Import Format Specification
## File Format
- **Encoding**: UTF-8
- **Delimiter**: Comma (`,`)
- **Header Rows**: First 4 rows are skipped during import
- Row 1: Empty
- Row 2: Headers (French labels)
- Row 3: Description row
- Row 4: Column names
- **Data Rows**: Start from row 5 onwards
## Column Structure
The CSV must contain exactly 21 columns in this order:
| Index | Field Name | Required | Type | Description |
|-------|------------|----------|------|-------------|
| 0 | identifier | No | String | Unique identifier for the thesis |
| 1 | title | **Yes** | String | Thesis title |
| 2 | subtitle | No | String | Thesis subtitle |
| 3 | authors | No | String | Author(s), comma-separated for multiple |
| 4 | contact | No | String | Contact email (associated with first author) |
| 5 | supervisors | No | String | Supervisor(s), comma-separated for multiple |
| 6 | formats | No | String | Format(s), comma-separated for multiple |
| 7 | year | **Yes** | Integer | Year of thesis (e.g., 2024) |
| 8 | ap | No | String | AP program code (see AP Codes section) |
| 9 | orientation | No | String | Orientation code (see Orientation Codes section) |
| 10 | finality | No | String | Finality name |
| 11 | keywords | No | String | Keywords, comma-separated (max 10) |
| 12 | synopsis | No | Text | Synopsis/abstract of the thesis |
| 13 | context | No | Text | Context note |
| 14 | remarks | No | Text | Additional remarks |
| 15 | language | No | String | Language (e.g., Français, English, Nederlands) |
| 16 | access | No | String | Access authorization |
| 17 | license | No | String | License information |
| 18 | size_info | No | String | File size information |
| 19 | jury_points | No | Float | Jury score (out of 20) |
| 20 | baiu_link | No | String | Link to BAIU (institutional archive) |
## Field Details
### Required Fields
- **title**: Must not be empty
- **year**: Must not be empty and must be a valid integer
### Multi-Value Fields
These fields accept multiple values separated by commas:
- **authors**: e.g., `"John Doe, Jane Smith"`
- **supervisors**: e.g., `"Prof. A, Prof. B"`
- **keywords**: Maximum 10 keywords, e.g., `"art, design, digital"`
- **formats**: e.g., `"PDF, Video, Installation"`
### Orientation Codes
Valid orientation codes and their full names:
```
SC = Sculpture
VI = Vidéographie
CA = Cinéma d'animation
IP = Installation-Performance
PE = Peinture
PH = Photographie
DE = Dessin
AN = Arts Numériques
GR = Graphisme
TY = Typographie
DN = Design Numérique
IL = Illustration
BD = Bande-Dessinée
SE = Sérigraphie
GV = Gravure
```
### AP Codes
Valid AP program codes:
- `DPM`
- `LIENS`
- `APS`
(These codes must match exactly what exists in the `ap_programs` table)
### Language Values
Languages should be provided with capital first letter:
- `Français`
- `English`
- `Nederlands`
- etc.
### Format Values
Common format values (case-insensitive, will be normalized):
- `PDF`
- `Video`
- `Audio`
- `Installation`
- `Web`
- etc.
## Import Behavior
### Row Processing
1. Empty rows (no title and no identifier) are skipped
2. Each row is processed in a transaction
3. If a row fails, it is skipped and logged, but processing continues
### Data Validation
- If title or year is missing, the row is rejected
- Invalid orientation codes result in no orientation being set (null)
- Invalid AP codes result in no AP program being set (null)
- Keywords are limited to first 10 if more are provided
### Data Normalization
- All string fields are trimmed of whitespace
- Language and format values are normalized (first letter capitalized, rest lowercase)
- Empty strings are converted to NULL in the database
### Entity Creation
- Authors, supervisors, and keywords are automatically created if they don't exist
- Existing authors are matched by name
- Contact email is only associated with the first author
## Example CSV Structure
```csv
Identifiant,Titre,Sous-titre,Auteur·ice(s),Contact,Promoteur·ice(s),Format,Année,AP,Orientation,Finalité,Mots-clés,Synopsis,Contexte,Remarques,Langue,Autorisation,License,taille,Points sur 20,lien BAIU
TFE-2024-001,Mon projet artistique,Exploration du numérique,"Alice Dupont, Bob Martin",alice@example.com,Prof. Smith,PDF,2024,DPM,AN,Création,art numérique,digital art,interactive installation,Un projet explorant l'intersection de l'art et de la technologie,Réalisé dans le cadre du master,Très bon projet,Français,Public,CC-BY,250MB,16.5,https://baiu.example.org/12345
TFE-2024-002,Design graphique moderne,,Charlie Brown,charlie@example.com,"Prof. A, Prof. B","PDF, Print",2024,LIENS,GR,Design,typographie,graphisme,design,Une exploration de la typographie contemporaine,,,English,Restricted,All rights reserved,50MB,15,
```
## Troubleshooting
### Common Issues
1. **Encoding problems**: Ensure file is saved as UTF-8
2. **Missing columns**: All 21 columns must be present, even if empty
3. **Line breaks in fields**: Ensure fields containing newlines are properly quoted
4. **Quote escaping**: Use double quotes (`""`) to escape quotes within fields
### Import Results
After import, the system will display:
- Number of theses successfully imported
- Number of rows skipped due to errors
- Detailed line-by-line results with success (✓) or error (✗) indicators
## Notes
- The import process preserves the order of authors, supervisors, and keywords
- The first author gets the contact email if provided
- Duplicate detection is not performed - each import creates new entries
- Failed rows do not stop the import process
- All errors are logged to the server error log

View File

@@ -1,357 +0,0 @@
# Migration from YAML to SQLite
## Overview
The Post-ERG thesis submission form has been completely overhauled to use a SQLite database instead of flat YAML files. This provides better data integrity, querying capabilities, and prepares the system for a full-featured web application.
## What Changed
### Database Implementation
**Before:** Form data was saved as individual YAML files in `data/yaml/`, with file uploads scattered in `data/content/` and `data/cover/`.
**After:** All thesis data is now stored in a relational SQLite database (`../db/posterg.db`) with proper normalization and foreign key relationships.
### New Architecture
```
Form Submission Flow:
1. User fills out enhanced form (index.php)
2. Form validates input and begins database transaction
3. Creates/links: author, thesis, supervisors, keywords, languages, formats
4. Uploads files with random names for security
5. Records file metadata in database
6. Commits transaction (all-or-nothing)
7. Redirects to confirmation page showing database data
```
### Database Schema Highlights
- **19 tables** including junction tables and views
- **Normalized structure** (3rd Normal Form)
- **Automatic timestamps** via triggers
- **Cascade deletes** for referential integrity
- **Predefined lookup tables** for orientations, AP programs, finalities, etc.
- **Views** for simplified querying (v_theses_full, v_theses_public)
## New Files
### `Database.php`
Database helper class providing:
- PDO connection with error handling
- Transaction management
- Find-or-create methods for entities
- Prepared statement helpers
- Lookup methods for all reference data
**Key Methods:**
```php
$db = new Database();
$authorId = $db->findOrCreateAuthor($name, $email);
$keywordId = $db->findOrCreateKeyword($keyword);
$orientations = $db->getAllOrientations();
$thesis = $db->getThesis($id);
```
## Modified Files
### `index.php`
**Enhancements:**
- Dynamically loads form options from database
- Added required fields per schema:
- Subtitle (optional)
- Synopsis (~200 words, required)
- Finality (Approfondi/Enseignement/Spécialisé)
- Languages (multiple selection with checkboxes)
- Formats (multiple selection with checkboxes)
- Better form organization with sections
- Improved accessibility (proper labels, IDs)
**New Form Fields:**
| Field | Type | Required | Notes |
|-------|------|----------|-------|
| Subtitle | Text | No | New field |
| Synopsis | Textarea | Yes | ~200 words |
| Finality | Select | Yes | From finality_types table |
| Languages | Checkboxes | Yes | Multiple selection |
| Formats | Checkboxes | No | Multiple selection |
### `formulaire.php`
**Complete rewrite** with:
1. **Transaction-Based Processing:**
- `BEGIN TRANSACTION` at start
- All insertions in single transaction
- `COMMIT` on success or `ROLLBACK` on error
- Ensures data consistency
2. **Prepared Statements:**
- All SQL queries use PDO prepared statements
- Protection against SQL injection
- Parameter binding for all user input
3. **Entity Creation:**
- Finds or creates authors (by name)
- Finds or creates supervisors (by name)
- Finds or creates keywords (by text)
- Links all entities via junction tables
4. **Identifier Generation:**
- Format: `YYYY-NNN` (e.g., "2026-001")
- Automatically increments per year
- Unique constraint in database
5. **File Handling:**
- Random cryptographic filenames (32 hex chars)
- Organized by year and identifier: `data/theses/YYYY/YYYY-NNN/`
- Cover images separate: `data/covers/`
- Metadata stored in `thesis_files` table
6. **Validation:**
- Year range: 2000 to current year + 1
- Max 10 keywords enforced
- At least one language required
- URL format validation
- File type and size validation
### `thanks.php`
**Complete redesign:**
- Reads from database using thesis ID
- Displays data from `v_theses_full` view
- Shows all relationships: authors, supervisors, keywords, languages, formats
- Lists uploaded files with metadata (type, size, date)
- Responsive CSS grid layout
- Publication status indicator
**Security:**
- Validates thesis ID (integer only)
- Uses prepared statements
- No path traversal vulnerability
- Error messages don't expose system details
## Database Files
### `../db/posterg.db`
Initialized SQLite database with:
- 19 tables (11 core, 5 junction, 3 reference)
- 2 views (v_theses_full, v_theses_public)
- Predefined data:
- 15 orientations
- 4 AP programs
- 3 finality types
- 2 languages (French, English)
- 7 format types
- 3 access types
- 4 static pages
### Schema Documentation
See `../db/README.md` and `../db/SETUP.md` for complete documentation.
## Security Improvements Retained
All security improvements from the previous commit are preserved:
✅ CSRF protection with session tokens
✅ Input validation and sanitization
✅ Prepared statements (SQL injection protection)
✅ Random filenames for uploads
✅ File type and size validation
✅ MIME type checking
✅ Error logging without exposing paths
✅ Path traversal protection
## Data Mapping
### YAML to Database Mapping
| Old YAML Field | New Database Location | Notes |
|----------------|----------------------|-------|
| `auteurice` | `authors.name` | Normalized, reusable |
| `email` | `authors.email` | Now in authors table |
| `année` | `theses.year` | Integer field |
| `titre` | `theses.title` | Required |
| - | `theses.subtitle` | New field |
| `description` | `theses.synopsis` | Renamed for clarity |
| `problématique` | (not yet used) | Can be added to schema |
| `orientation` | `theses.orientation_id` | Foreign key to orientations |
| `ap` | `theses.ap_program_id` | Foreign key to ap_programs |
| - | `theses.finality_id` | New field (required) |
| `promoteurice` | `supervisors.name` + `thesis_supervisors` | Many-to-many |
| `tag` | `keywords.keyword` + `thesis_keywords` | Many-to-many, max 10 |
| `lien` | `theses.baiu_link` | URL validation |
| `files` | `thesis_files` table | Full metadata |
| `couverture` | (stored as file, not in DB yet) | Could add cover_path column |
## Migration Path for Existing Data
If you have existing YAML files to import:
1. **Parse YAML files:**
```php
$yamlFiles = glob('data/yaml/*.yaml');
foreach ($yamlFiles as $file) {
$data = Yaml::parseFile($file);
// ...
}
```
2. **Insert into database:**
```php
$db->beginTransaction();
try {
$authorId = $db->findOrCreateAuthor($data['auteurice'], $data['email']);
// Insert thesis
// Link relationships
$db->commit();
} catch (Exception $e) {
$db->rollback();
}
```
3. **Verify data:**
```sql
SELECT COUNT(*) FROM theses;
SELECT * FROM v_theses_full LIMIT 5;
```
## Testing Checklist
Before production deployment:
- [ ] Form loads without errors
- [ ] All dropdown options populate from database
- [ ] Form submission creates thesis record
- [ ] Author is created or found correctly
- [ ] Supervisors linked properly
- [ ] Keywords created and linked (test max 10)
- [ ] Languages required (test validation)
- [ ] Formats optional (test multiple selection)
- [ ] Files upload successfully
- [ ] File metadata recorded in database
- [ ] Thanks page displays all data correctly
- [ ] Transaction rollback works on error
- [ ] CSRF token validated
- [ ] Invalid data rejected (year, URL, etc.)
## Known Limitations
1. **No cover_path column:** Cover images uploaded but path not stored in `theses` table (can be added)
2. **No problématique field:** Old field not yet in schema (can be added to `theses.remarks` or new column)
3. **File type detection:** Basic (by extension), could be enhanced
4. **No duplicate detection:** Same thesis can be submitted multiple times
5. **No edit capability:** Once submitted, no UI to edit (admin interface needed)
## Next Steps
1. **Initialize production database:**
```bash
cd /path/to/production/db
sqlite3 posterg.db < schema.sql
```
2. **Set permissions:**
```bash
chmod 644 posterg.db
chown www-data:www-data posterg.db
```
3. **Test form submission:**
- Submit test thesis
- Verify all fields saved
- Check file uploads
- Test thanks page
4. **Import existing data:**
- Create migration script
- Parse old YAML files
- Bulk insert into database
- Verify integrity
5. **Build admin interface:**
- CRUD operations for theses
- User management
- Approval workflow
- Bulk operations
6. **Build public website:**
- Search and filter theses
- Respect access controls
- Display thesis details
- Static pages management
## Compatibility Notes
### PHP Requirements
- PHP 7.4+ (tested on PHP 8.x)
- PDO extension with SQLite support
- Composer for Symfony YAML (still used for potential migration)
### Database
- SQLite 3.8.0+
- File-based database (no server needed)
- Single file: `db/posterg.db`
### Dependencies
```json
{
"require": {
"symfony/yaml": "^6.2",
"behat/transliterator": "^1.5"
}
}
```
Note: YAML library retained for potential data migration from old files.
## Backup Strategy
SQLite database is a single file - easy to backup:
```bash
# Simple copy
cp db/posterg.db db/backups/posterg_$(date +%Y%m%d).db
# SQL dump (portable)
sqlite3 db/posterg.db .dump > backups/posterg_$(date +%Y%m%d).sql
# Compressed backup
tar -czf backups/posterg_$(date +%Y%m%d).tar.gz db/posterg.db data/
```
Set up automated daily backups via cron.
## Performance Considerations
- **Indexes:** All critical foreign keys and search fields indexed
- **Views:** Pre-computed joins for common queries
- **Transactions:** Ensure atomicity without locking issues
- **File I/O:** Random filenames prevent directory listing overhead
For large datasets (1000+ theses):
- Consider WAL mode: `PRAGMA journal_mode=WAL;`
- Optimize with `ANALYZE;` periodically
- Monitor database size and `VACUUM` if needed
## Rollback Plan
If issues arise, you can roll back to YAML-based system:
1. Use previous jj commit: `jj checkout <commit-id>`
2. Old YAML files in `data/yaml/` still intact
3. Database changes don't affect old YAML code
4. Can run both systems in parallel during transition
## Support
For questions or issues:
- Schema documentation: `db/README.md`
- Setup guide: `db/SETUP.md`
- Security details: `SECURITY.md`
- Technical specs: `db/posterg_fiche-technique.md`
---
**Migration completed:** 2026-01-27
**Database version:** 1.0
**Form version:** 2.0 (SQLite)

View File

@@ -1,163 +0,0 @@
# Security Improvements
## Changes Made
### 1. Critical Vulnerability Fixes
#### Path Traversal in thanks.php (CRITICAL)
- **Before**: User could access ANY file on the system via `?file=../../../../etc/passwd`
- **After**:
- Validates file path using `realpath()` to resolve symlinks
- Ensures file is within allowed `data/yaml/` directory
- Verifies file extension is `.yaml`
- Proper error handling without exposing system paths
#### CSRF Protection
- **Before**: Form could be submitted from any website
- **After**:
- Session-based CSRF tokens generated for each form load
- Token validated on submission using timing-safe comparison (`hash_equals()`)
- Token cleared after successful submission
### 2. Input Validation & Sanitization
#### Deprecated Functions Replaced
- **Before**: Used `FILTER_SANITIZE_STRING` (deprecated in PHP 8.1+)
- **After**: Custom `sanitize_string()` function using `htmlspecialchars()` and `strip_tags()`
#### Enhanced Validation
- Required fields properly validated with custom `validate_required()` function
- Email validation using `FILTER_VALIDATE_EMAIL`
- URL validation using `FILTER_VALIDATE_URL`
- Year validation with reasonable range checking (2000 to current year + 1)
- Comprehensive error messages for validation failures
### 3. File Upload Security
#### Random Filenames
- **Before**: Used original or predictable filenames (author + timestamp)
- **After**:
- Generates cryptographically secure random filenames using `random_bytes()`
- Prevents file overwrites
- Prevents path traversal attacks via malicious filenames
- Stores mapping to original filename for reference
#### Enhanced File Validation
- MIME type checking using `finfo`
- File extension whitelist
- File size limits (50MB max)
- Proper error handling for upload errors
- Cover image restricted to JPEG/PNG only
### 4. Bug Fixes
- Fixed undefined variable `$memoireFolder` (used before definition)
- Fixed undefined variable `$resume` (should be `$description`)
- Fixed variable ordering (generate `$uniqueId` before using it)
- Added proper `__DIR__` prefix for absolute paths
### 5. Error Handling
- Try-catch block wraps entire form processing
- Detailed error logging (not exposed to users)
- User-friendly error messages
- Proper exit after redirect
- No system path exposure in error messages
## Nginx Configuration Notes
Since this form is behind nginx password authentication, additional security layers:
### Recommended nginx config:
```nginx
location /formulaire {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;
# Rate limiting
limit_req zone=form_limit burst=5 nodelay;
# File upload size
client_max_body_size 100M;
# Timeout settings
client_body_timeout 60s;
# Prevent access to sensitive files
location ~ /\. {
deny all;
}
location ~ /(vendor|composer\.(json|lock)|error\.log)$ {
deny all;
}
}
```
## Additional Recommendations
### 1. Database Migration (In Progress)
Moving to SQLite will provide:
- Structured data storage
- Better query capabilities
- Easier data management
- Prepared statements for SQL injection prevention
### 2. File Storage
- Consider moving uploaded files outside web root
- Serve files through PHP script with access control
- Implement file scanning for malware if possible
### 3. Monitoring
- Regularly review `error.log` for suspicious activity
- Monitor file upload patterns
- Set up alerts for failed CSRF validations
### 4. Backup Strategy
- Regular backups of `data/` directory
- Version control for code changes
- Test restore procedures
### 5. PHP Configuration
Ensure these settings in php.ini:
```ini
file_uploads = On
upload_max_filesize = 100M
post_max_size = 100M
max_execution_time = 60
max_input_time = 60
memory_limit = 256M
# Security
expose_php = Off
allow_url_fopen = Off
allow_url_include = Off
display_errors = Off
log_errors = On
```
## Testing Checklist
- [ ] Form submission with all fields
- [ ] Form submission with minimal required fields
- [ ] Invalid email format
- [ ] Invalid URL format
- [ ] Invalid year
- [ ] File upload (various formats)
- [ ] Large file upload (>50MB, should fail)
- [ ] Invalid file types
- [ ] Multiple file uploads
- [ ] Cover image upload
- [ ] CSRF token validation (try submitting with wrong token)
- [ ] Path traversal attempt in thanks.php
- [ ] Error handling for missing directories
## Known Limitations
1. **No atomic transactions**: File operations and YAML save not atomic
2. **No rollback**: Failed submissions may leave partial files
3. **Session storage**: CSRF tokens in default PHP session (consider database sessions)
4. **No upload progress**: Large files have no progress indicator
5. **No duplicate detection**: Same submission can be made multiple times
These limitations will be addressed in the SQLite migration.