Files
xamxam/formulaire/MIGRATION.md
Théophile Gervreau-Mercier 95f52d549e Add comprehensive thesis management system with database migration
This commit introduces a complete thesis management interface and migrates
the system from YAML-based storage to SQLite:

Core Changes:
- Add Database.php helper class with PDO connection and entity management
- Add list.php for viewing all theses with filtering and sorting
- Add edit.php for modifying existing thesis records
- Add import.php for migrating legacy YAML data to SQLite
- Add justfile with development tasks (serve, init-test-db, etc.)

Documentation:
- Add MIGRATION.md with complete migration guide and architecture docs
- Update README.md with database setup and Just recipe instructions
- Update .gitignore to exclude test databases and error logs

Modified Forms:
- Enhanced formulaire.php with transaction-based SQLite processing
- Updated index.php with database-driven form options
- Improved thanks.php to read from database views

The new architecture provides:
- Normalized database schema (19 tables, 2 views)
- Transaction safety and referential integrity
- CRUD operations for thesis management
- Filtering by year, orientation, AP program, publication status
- Secure file handling with metadata tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-28 10:08:50 +01:00

10 KiB

Migration from YAML to SQLite

Overview

The Post-ERG thesis submission form has been completely overhauled to use a SQLite database instead of flat YAML files. This provides better data integrity, querying capabilities, and prepares the system for a full-featured web application.

What Changed

Database Implementation

Before: Form data was saved as individual YAML files in data/yaml/, with file uploads scattered in data/content/ and data/cover/.

After: All thesis data is now stored in a relational SQLite database (../db/posterg.db) with proper normalization and foreign key relationships.

New Architecture

Form Submission Flow:
1. User fills out enhanced form (index.php)
2. Form validates input and begins database transaction
3. Creates/links: author, thesis, supervisors, keywords, languages, formats
4. Uploads files with random names for security
5. Records file metadata in database
6. Commits transaction (all-or-nothing)
7. Redirects to confirmation page showing database data

Database Schema Highlights

  • 19 tables including junction tables and views
  • Normalized structure (3rd Normal Form)
  • Automatic timestamps via triggers
  • Cascade deletes for referential integrity
  • Predefined lookup tables for orientations, AP programs, finalities, etc.
  • Views for simplified querying (v_theses_full, v_theses_public)

New Files

Database.php

Database helper class providing:

  • PDO connection with error handling
  • Transaction management
  • Find-or-create methods for entities
  • Prepared statement helpers
  • Lookup methods for all reference data

Key Methods:

$db = new Database();
$authorId = $db->findOrCreateAuthor($name, $email);
$keywordId = $db->findOrCreateKeyword($keyword);
$orientations = $db->getAllOrientations();
$thesis = $db->getThesis($id);

Modified Files

index.php

Enhancements:

  • Dynamically loads form options from database
  • Added required fields per schema:
    • Subtitle (optional)
    • Synopsis (~200 words, required)
    • Finality (Approfondi/Enseignement/Spécialisé)
    • Languages (multiple selection with checkboxes)
    • Formats (multiple selection with checkboxes)
  • Better form organization with sections
  • Improved accessibility (proper labels, IDs)

New Form Fields:

Field Type Required Notes
Subtitle Text No New field
Synopsis Textarea Yes ~200 words
Finality Select Yes From finality_types table
Languages Checkboxes Yes Multiple selection
Formats Checkboxes No Multiple selection

formulaire.php

Complete rewrite with:

  1. Transaction-Based Processing:

    • BEGIN TRANSACTION at start
    • All insertions in single transaction
    • COMMIT on success or ROLLBACK on error
    • Ensures data consistency
  2. Prepared Statements:

    • All SQL queries use PDO prepared statements
    • Protection against SQL injection
    • Parameter binding for all user input
  3. Entity Creation:

    • Finds or creates authors (by name)
    • Finds or creates supervisors (by name)
    • Finds or creates keywords (by text)
    • Links all entities via junction tables
  4. Identifier Generation:

    • Format: YYYY-NNN (e.g., "2026-001")
    • Automatically increments per year
    • Unique constraint in database
  5. File Handling:

    • Random cryptographic filenames (32 hex chars)
    • Organized by year and identifier: data/theses/YYYY/YYYY-NNN/
    • Cover images separate: data/covers/
    • Metadata stored in thesis_files table
  6. Validation:

    • Year range: 2000 to current year + 1
    • Max 10 keywords enforced
    • At least one language required
    • URL format validation
    • File type and size validation

thanks.php

Complete redesign:

  • Reads from database using thesis ID
  • Displays data from v_theses_full view
  • Shows all relationships: authors, supervisors, keywords, languages, formats
  • Lists uploaded files with metadata (type, size, date)
  • Responsive CSS grid layout
  • Publication status indicator

Security:

  • Validates thesis ID (integer only)
  • Uses prepared statements
  • No path traversal vulnerability
  • Error messages don't expose system details

Database Files

../db/posterg.db

Initialized SQLite database with:

  • 19 tables (11 core, 5 junction, 3 reference)
  • 2 views (v_theses_full, v_theses_public)
  • Predefined data:
    • 15 orientations
    • 4 AP programs
    • 3 finality types
    • 2 languages (French, English)
    • 7 format types
    • 3 access types
    • 4 static pages

Schema Documentation

See ../db/README.md and ../db/SETUP.md for complete documentation.

Security Improvements Retained

All security improvements from the previous commit are preserved:

CSRF protection with session tokens Input validation and sanitization Prepared statements (SQL injection protection) Random filenames for uploads File type and size validation MIME type checking Error logging without exposing paths Path traversal protection

Data Mapping

YAML to Database Mapping

Old YAML Field New Database Location Notes
auteurice authors.name Normalized, reusable
email authors.email Now in authors table
année theses.year Integer field
titre theses.title Required
- theses.subtitle New field
description theses.synopsis Renamed for clarity
problématique (not yet used) Can be added to schema
orientation theses.orientation_id Foreign key to orientations
ap theses.ap_program_id Foreign key to ap_programs
- theses.finality_id New field (required)
promoteurice supervisors.name + thesis_supervisors Many-to-many
tag keywords.keyword + thesis_keywords Many-to-many, max 10
lien theses.baiu_link URL validation
files thesis_files table Full metadata
couverture (stored as file, not in DB yet) Could add cover_path column

Migration Path for Existing Data

If you have existing YAML files to import:

  1. Parse YAML files:

    $yamlFiles = glob('data/yaml/*.yaml');
    foreach ($yamlFiles as $file) {
        $data = Yaml::parseFile($file);
        // ...
    }
    
  2. Insert into database:

    $db->beginTransaction();
    try {
        $authorId = $db->findOrCreateAuthor($data['auteurice'], $data['email']);
        // Insert thesis
        // Link relationships
        $db->commit();
    } catch (Exception $e) {
        $db->rollback();
    }
    
  3. Verify data:

    SELECT COUNT(*) FROM theses;
    SELECT * FROM v_theses_full LIMIT 5;
    

Testing Checklist

Before production deployment:

  • Form loads without errors
  • All dropdown options populate from database
  • Form submission creates thesis record
  • Author is created or found correctly
  • Supervisors linked properly
  • Keywords created and linked (test max 10)
  • Languages required (test validation)
  • Formats optional (test multiple selection)
  • Files upload successfully
  • File metadata recorded in database
  • Thanks page displays all data correctly
  • Transaction rollback works on error
  • CSRF token validated
  • Invalid data rejected (year, URL, etc.)

Known Limitations

  1. No cover_path column: Cover images uploaded but path not stored in theses table (can be added)
  2. No problématique field: Old field not yet in schema (can be added to theses.remarks or new column)
  3. File type detection: Basic (by extension), could be enhanced
  4. No duplicate detection: Same thesis can be submitted multiple times
  5. No edit capability: Once submitted, no UI to edit (admin interface needed)

Next Steps

  1. Initialize production database:

    cd /path/to/production/db
    sqlite3 posterg.db < schema.sql
    
  2. Set permissions:

    chmod 644 posterg.db
    chown www-data:www-data posterg.db
    
  3. Test form submission:

    • Submit test thesis
    • Verify all fields saved
    • Check file uploads
    • Test thanks page
  4. Import existing data:

    • Create migration script
    • Parse old YAML files
    • Bulk insert into database
    • Verify integrity
  5. Build admin interface:

    • CRUD operations for theses
    • User management
    • Approval workflow
    • Bulk operations
  6. Build public website:

    • Search and filter theses
    • Respect access controls
    • Display thesis details
    • Static pages management

Compatibility Notes

PHP Requirements

  • PHP 7.4+ (tested on PHP 8.x)
  • PDO extension with SQLite support
  • Composer for Symfony YAML (still used for potential migration)

Database

  • SQLite 3.8.0+
  • File-based database (no server needed)
  • Single file: db/posterg.db

Dependencies

{
    "require": {
        "symfony/yaml": "^6.2",
        "behat/transliterator": "^1.5"
    }
}

Note: YAML library retained for potential data migration from old files.

Backup Strategy

SQLite database is a single file - easy to backup:

# Simple copy
cp db/posterg.db db/backups/posterg_$(date +%Y%m%d).db

# SQL dump (portable)
sqlite3 db/posterg.db .dump > backups/posterg_$(date +%Y%m%d).sql

# Compressed backup
tar -czf backups/posterg_$(date +%Y%m%d).tar.gz db/posterg.db data/

Set up automated daily backups via cron.

Performance Considerations

  • Indexes: All critical foreign keys and search fields indexed
  • Views: Pre-computed joins for common queries
  • Transactions: Ensure atomicity without locking issues
  • File I/O: Random filenames prevent directory listing overhead

For large datasets (1000+ theses):

  • Consider WAL mode: PRAGMA journal_mode=WAL;
  • Optimize with ANALYZE; periodically
  • Monitor database size and VACUUM if needed

Rollback Plan

If issues arise, you can roll back to YAML-based system:

  1. Use previous jj commit: jj checkout <commit-id>
  2. Old YAML files in data/yaml/ still intact
  3. Database changes don't affect old YAML code
  4. Can run both systems in parallel during transition

Support

For questions or issues:

  • Schema documentation: db/README.md
  • Setup guide: db/SETUP.md
  • Security details: SECURITY.md
  • Technical specs: db/posterg_fiche-technique.md

Migration completed: 2026-01-27 Database version: 1.0 Form version: 2.0 (SQLite)