# Post-ERG Thesis Database Schema SQLite database schema for managing final thesis projects (TFE) and doctoral theses at ERG. ## Overview This schema supports all requirements from the technical specifications (`posterg_fiche-technique.md`): - Multiple metadata categories (orientation, AP, finality, languages, formats, keywords) - Multiple authors and supervisors per thesis - Access control (Libre/Interne/Interdit) - Licensing management - File uploads (main TFE, annexes, written parts) - Jury notes and points - Publication workflow (submission → defense → publication) - Editable static pages (charte, about, licenses, contact) - Distinction between TFEs and doctoral theses ## Database Structure ### Core Tables **`theses`** - Main thesis information - Basic metadata (title, subtitle, year, identifier) - Academic details (orientation, AP program, finality) - Content (synopsis, jury notes, duration/size) - Access control and licensing - Publication workflow status **`authors`** - Student/author information - Name and contact email **`supervisors`** - Thesis promoters - Name of supervisor/promoter **`thesis_files`** - Uploaded files - Main TFE, annexes, written parts - File metadata (path, size, MIME type) **`pages`** - Static content pages - Charte, about, licenses, contact pages - Easily editable content ### Reference Tables (Predefined Lists) - `orientations` - Arts Numériques, Dessin, Cinéma d'animation, etc. - `ap_programs` - Narration Spéculative, DPM, APS, LIENS - `finality_types` - Approfondi, Enseignement, Spécialisé - `languages` - Français, Anglais, etc. (expandable) - `format_types` - Site web, Audio, Vidéo, Performance, etc. - `keywords` - Dynamic, expandable keyword list (max 10 per thesis) - `access_types` - Libre, Interne, Interdit - `license_types` - To be defined ### Junction Tables (Many-to-Many) - `thesis_authors` - Links theses to authors - `thesis_supervisors` - Links theses to supervisors - `thesis_languages` - Multiple languages per thesis - `thesis_formats` - Multiple formats per thesis - `thesis_keywords` - Max 10 keywords per thesis ## Key Features ### 1. Flexible Metadata - Multiple authors, supervisors, languages, formats, and keywords per thesis - Predefined lists with ability to add new entries - Proper normalization to avoid data duplication ### 2. Access Control Three levels of access as specified: - **Libre**: Freely accessible online and in library - **Interne**: Physical access only, descriptive note online - **Interdit**: No physical/online access, descriptive note only **Important**: Access can be restricted but never opened (as per specs) ### 3. Publication Workflow The schema tracks the complete lifecycle: 1. **Submission** (`submitted_at`) - Student submits TFE 2. **Defense** (`defense_date`) - Soutenance takes place 3. **Jury Review** (`jury_note_added`, `jury_points`, `context_note`) 4. **Publication** (`published_at`, `is_published = 1`) **Important**: TFEs are NOT published immediately upon submission. They must wait for: - Defense to occur - Jury to add optional context note (max 150 words) - Jury points to be recorded ### 4. File Management Support for multiple file types per thesis: - Main TFE work - Annexes - Written part - Other supporting files ### 5. Views for Easy Querying **`v_theses_full`** - Complete thesis information with all related data - Joins all tables - Concatenates multiple values (authors, supervisors, keywords, etc.) - Use for backend/admin interfaces **`v_theses_public`** - Only published theses - Filtered to `is_published = 1` - Use for public-facing website ## Usage ### Initialize Database ```bash sqlite3 posterg.db < schema.sql ``` ### Example Queries #### Get all published theses from 2025 ```sql SELECT * FROM v_theses_public WHERE year = 2025; ``` #### Get theses by orientation ```sql SELECT * FROM v_theses_full WHERE orientation = 'Vidéographie'; ``` #### Get theses with specific keyword ```sql SELECT t.* FROM v_theses_full t JOIN thesis_keywords tk ON t.id = tk.thesis_id JOIN keywords k ON tk.keyword_id = k.id WHERE k.keyword = 'performance'; ``` #### Get theses awaiting publication (submitted but not published) ```sql SELECT * FROM theses WHERE submitted_at IS NOT NULL AND is_published = 0; ``` #### Update access type (can only restrict, not open) ```sql -- Allowed: from Libre to Interne UPDATE theses SET access_type_id = 2 WHERE id = 1; -- Not allowed per specs: from Interdit to Libre -- This should be enforced in application logic ``` ## Data Import Notes Based on `Database_TFE_test.csv`: ### Current CSV Structure - Identifiant (e.g., "2025-002") - Titre, Sous-titre - Auteur·ice(s) - comma-separated if multiple - Contact - email - Promoteur·ice(s) - comma-separated if multiple - Format - comma-separated if multiple - Année - AP - abbreviation (DPM, LIENS, etc.) - Orientation - abbreviation (SC, VI, CA, etc.) - Finalité - Mots-clés - comma-separated, max 10 - Synopsis - Contexte - jury context note - Remarques - internal notes - Langue - language(s) - Autorisation - access type - License - license type - taille - duration/size info - Points sur 20 - jury points - lien BAIU - institutional repository link ### Import Considerations 1. **Parse comma-separated values** for: - Authors (split and create entries in `authors` table) - Supervisors (split and create entries in `supervisors` table) - Formats (map to `format_types`) - Keywords (split and create/link in `keywords`) - Languages (split and map to `languages`) 2. **Map abbreviations**: - Orientations: SC → Sculpture, VI → Vidéographie, CA → Cinéma d'animation, etc. - AP: DPM, LIENS, APS (exact match) 3. **Handle missing data**: - Some fields in CSV are empty (AP, Orientation for some entries) - Use NULL in database 4. **Parse duration/size**: - Examples: "128 pages", "78 pages + ?? minutes", "68 minutes" - Extract numeric values for `duration_pages` and `duration_minutes` - Store original string in `file_size_info` ## Schema Design Decisions ### Why SQLite? - Self-contained, serverless - Easy to backup (single file) - Good performance for this use case - Simple to integrate with various tools ### Normalization Level - 3rd Normal Form (3NF) for most tables - Denormalized views for read performance - Balance between flexibility and simplicity ### Extensibility - New languages can be added via `languages` table - Keywords are dynamic and grow with content - License types can be defined later - Static pages can be added via `pages` table ### Constraints - CASCADE deletes on junction tables - UNIQUE constraints on lookup table names - NOT NULL on critical fields - Automatic timestamps via triggers ## Important Business Rules 1. **No immediate publication**: TFEs must go through defense before publication 2. **Access restriction is one-way**: Can restrict but not open access 3. **Max 10 keywords** per thesis (enforce in application) 4. **Jury context note max 150 words** (enforce in application) 5. **Synopsis ~200 words** (guideline, not hard limit) 6. **Multiple selections allowed** for: languages, formats, authors, supervisors, keywords 7. **Doctoral theses**: Use `is_doctoral = 1` to distinguish from TFEs ## Next Steps 1. Create import script to load CSV data 2. Define license types 3. Build backend API for CRUD operations 4. Implement authorization checks 5. Create admin interface for easy editing 6. Build public-facing website using views