Files
xamxam/database/README.md
Théophile Gervreau-Mercier 467aced734 Restructure repository and implement secure search feature
Phase 1: Consolidate shared infrastructure
- Create shared/ directory for common code
- Consolidate Database.php from front-backend and formulaire into unified shared/Database.php
  - Smart path detection for test.db vs posterg.db
  - Secure search with wildcard escaping and input validation
  - Support both singleton and direct instantiation patterns
  - Full CRUD methods for admin functionality
- Move RateLimit.php to shared/ (30 requests/min)
- Update all require paths across apps to use shared/

Phase 2: Reorganize directory structure
- Rename front-backend/ → apps/public/
- Rename formulaire/ → apps/admin/
- Rename db/ → database/
- Update all file paths for new structure
- Create root .gitignore excluding databases, cache, logs

Implement secure search feature
- Add apps/public/search.php with full-text search across theses
- Search filters: query, year, orientation, AP program, keywords
- Security features:
  - SQL injection prevention (prepared statements)
  - Wildcard injection prevention (escape % and _)
  - Input validation (max 200 chars, year range 1900-2100)
  - Rate limiting (30 req/min per IP)
  - Pagination limited to 100 results/page
  - XSS protection (htmlspecialchars on output)

Add comprehensive test suite
- Create apps/public/tests/ with proper structure
  - tests/Integration/SearchTest.php - 12 search scenarios
  - tests/Security/SecurityTest.php - vulnerability testing
  - tests/Unit/RateLimitTest.php - rate limit behavior
- Create database/fixtures/CreateTestDatabase.php
- Add apps/public/run-tests.php test runner
- All tests passing (4/4 suites)

Update deployment configuration
- Rename justfile 'sync' recipe to 'deploy'
- Create deploy group with separate deploy-public and deploy-admin
- Add test-deploy recipe for test database
- Exclude *.db, tests/, cache/, *.md from production deploy
- Deploy shared/ to both public and admin locations

Stats: +4482 insertions, -654 deletions across 72 files
2026-02-02 18:53:58 +01:00

7.3 KiB

Post-ERG Thesis Database Schema

SQLite database schema for managing final thesis projects (TFE) and doctoral theses at ERG.

Overview

This schema supports all requirements from the technical specifications (posterg_fiche-technique.md):

  • Multiple metadata categories (orientation, AP, finality, languages, formats, keywords)
  • Multiple authors and supervisors per thesis
  • Access control (Libre/Interne/Interdit)
  • Licensing management
  • File uploads (main TFE, annexes, written parts)
  • Jury notes and points
  • Publication workflow (submission → defense → publication)
  • Editable static pages (charte, about, licenses, contact)
  • Distinction between TFEs and doctoral theses

Database Structure

Core Tables

theses - Main thesis information

  • Basic metadata (title, subtitle, year, identifier)
  • Academic details (orientation, AP program, finality)
  • Content (synopsis, jury notes, duration/size)
  • Access control and licensing
  • Publication workflow status

authors - Student/author information

  • Name and contact email

supervisors - Thesis promoters

  • Name of supervisor/promoter

thesis_files - Uploaded files

  • Main TFE, annexes, written parts
  • File metadata (path, size, MIME type)

pages - Static content pages

  • Charte, about, licenses, contact pages
  • Easily editable content

Reference Tables (Predefined Lists)

  • orientations - Arts Numériques, Dessin, Cinéma d'animation, etc.
  • ap_programs - Narration Spéculative, DPM, APS, LIENS
  • finality_types - Approfondi, Enseignement, Spécialisé
  • languages - Français, Anglais, etc. (expandable)
  • format_types - Site web, Audio, Vidéo, Performance, etc.
  • keywords - Dynamic, expandable keyword list (max 10 per thesis)
  • access_types - Libre, Interne, Interdit
  • license_types - To be defined

Junction Tables (Many-to-Many)

  • thesis_authors - Links theses to authors
  • thesis_supervisors - Links theses to supervisors
  • thesis_languages - Multiple languages per thesis
  • thesis_formats - Multiple formats per thesis
  • thesis_keywords - Max 10 keywords per thesis

Key Features

1. Flexible Metadata

  • Multiple authors, supervisors, languages, formats, and keywords per thesis
  • Predefined lists with ability to add new entries
  • Proper normalization to avoid data duplication

2. Access Control

Three levels of access as specified:

  • Libre: Freely accessible online and in library
  • Interne: Physical access only, descriptive note online
  • Interdit: No physical/online access, descriptive note only

Important: Access can be restricted but never opened (as per specs)

3. Publication Workflow

The schema tracks the complete lifecycle:

  1. Submission (submitted_at) - Student submits TFE
  2. Defense (defense_date) - Soutenance takes place
  3. Jury Review (jury_note_added, jury_points, context_note)
  4. Publication (published_at, is_published = 1)

Important: TFEs are NOT published immediately upon submission. They must wait for:

  • Defense to occur
  • Jury to add optional context note (max 150 words)
  • Jury points to be recorded

4. File Management

Support for multiple file types per thesis:

  • Main TFE work
  • Annexes
  • Written part
  • Other supporting files

5. Views for Easy Querying

v_theses_full - Complete thesis information with all related data

  • Joins all tables
  • Concatenates multiple values (authors, supervisors, keywords, etc.)
  • Use for backend/admin interfaces

v_theses_public - Only published theses

  • Filtered to is_published = 1
  • Use for public-facing website

Usage

Initialize Database

sqlite3 posterg.db < schema.sql

Example Queries

Get all published theses from 2025

SELECT * FROM v_theses_public WHERE year = 2025;

Get theses by orientation

SELECT * FROM v_theses_full
WHERE orientation = 'Vidéographie';

Get theses with specific keyword

SELECT t.* FROM v_theses_full t
JOIN thesis_keywords tk ON t.id = tk.thesis_id
JOIN keywords k ON tk.keyword_id = k.id
WHERE k.keyword = 'performance';

Get theses awaiting publication (submitted but not published)

SELECT * FROM theses
WHERE submitted_at IS NOT NULL
  AND is_published = 0;

Update access type (can only restrict, not open)

-- Allowed: from Libre to Interne
UPDATE theses SET access_type_id = 2 WHERE id = 1;

-- Not allowed per specs: from Interdit to Libre
-- This should be enforced in application logic

Data Import Notes

Based on Database_TFE_test.csv:

Current CSV Structure

  • Identifiant (e.g., "2025-002")
  • Titre, Sous-titre
  • Auteur·ice(s) - comma-separated if multiple
  • Contact - email
  • Promoteur·ice(s) - comma-separated if multiple
  • Format - comma-separated if multiple
  • Année
  • AP - abbreviation (DPM, LIENS, etc.)
  • Orientation - abbreviation (SC, VI, CA, etc.)
  • Finalité
  • Mots-clés - comma-separated, max 10
  • Synopsis
  • Contexte - jury context note
  • Remarques - internal notes
  • Langue - language(s)
  • Autorisation - access type
  • License - license type
  • taille - duration/size info
  • Points sur 20 - jury points
  • lien BAIU - institutional repository link

Import Considerations

  1. Parse comma-separated values for:

    • Authors (split and create entries in authors table)
    • Supervisors (split and create entries in supervisors table)
    • Formats (map to format_types)
    • Keywords (split and create/link in keywords)
    • Languages (split and map to languages)
  2. Map abbreviations:

    • Orientations: SC → Sculpture, VI → Vidéographie, CA → Cinéma d'animation, etc.
    • AP: DPM, LIENS, APS (exact match)
  3. Handle missing data:

    • Some fields in CSV are empty (AP, Orientation for some entries)
    • Use NULL in database
  4. Parse duration/size:

    • Examples: "128 pages", "78 pages + ?? minutes", "68 minutes"
    • Extract numeric values for duration_pages and duration_minutes
    • Store original string in file_size_info

Schema Design Decisions

Why SQLite?

  • Self-contained, serverless
  • Easy to backup (single file)
  • Good performance for this use case
  • Simple to integrate with various tools

Normalization Level

  • 3rd Normal Form (3NF) for most tables
  • Denormalized views for read performance
  • Balance between flexibility and simplicity

Extensibility

  • New languages can be added via languages table
  • Keywords are dynamic and grow with content
  • License types can be defined later
  • Static pages can be added via pages table

Constraints

  • CASCADE deletes on junction tables
  • UNIQUE constraints on lookup table names
  • NOT NULL on critical fields
  • Automatic timestamps via triggers

Important Business Rules

  1. No immediate publication: TFEs must go through defense before publication
  2. Access restriction is one-way: Can restrict but not open access
  3. Max 10 keywords per thesis (enforce in application)
  4. Jury context note max 150 words (enforce in application)
  5. Synopsis ~200 words (guideline, not hard limit)
  6. Multiple selections allowed for: languages, formats, authors, supervisors, keywords
  7. Doctoral theses: Use is_doctoral = 1 to distinguish from TFEs

Next Steps

  1. Create import script to load CSV data
  2. Define license types
  3. Build backend API for CRUD operations
  4. Implement authorization checks
  5. Create admin interface for easy editing
  6. Build public-facing website using views