mirror of
https://codeberg.org/PostERG/xamxam.git
synced 2026-05-06 19:19:19 +02:00
Restructure repository and implement secure search feature
Phase 1: Consolidate shared infrastructure - Create shared/ directory for common code - Consolidate Database.php from front-backend and formulaire into unified shared/Database.php - Smart path detection for test.db vs posterg.db - Secure search with wildcard escaping and input validation - Support both singleton and direct instantiation patterns - Full CRUD methods for admin functionality - Move RateLimit.php to shared/ (30 requests/min) - Update all require paths across apps to use shared/ Phase 2: Reorganize directory structure - Rename front-backend/ → apps/public/ - Rename formulaire/ → apps/admin/ - Rename db/ → database/ - Update all file paths for new structure - Create root .gitignore excluding databases, cache, logs Implement secure search feature - Add apps/public/search.php with full-text search across theses - Search filters: query, year, orientation, AP program, keywords - Security features: - SQL injection prevention (prepared statements) - Wildcard injection prevention (escape % and _) - Input validation (max 200 chars, year range 1900-2100) - Rate limiting (30 req/min per IP) - Pagination limited to 100 results/page - XSS protection (htmlspecialchars on output) Add comprehensive test suite - Create apps/public/tests/ with proper structure - tests/Integration/SearchTest.php - 12 search scenarios - tests/Security/SecurityTest.php - vulnerability testing - tests/Unit/RateLimitTest.php - rate limit behavior - Create database/fixtures/CreateTestDatabase.php - Add apps/public/run-tests.php test runner - All tests passing (4/4 suites) Update deployment configuration - Rename justfile 'sync' recipe to 'deploy' - Create deploy group with separate deploy-public and deploy-admin - Add test-deploy recipe for test database - Exclude *.db, tests/, cache/, *.md from production deploy - Deploy shared/ to both public and admin locations Stats: +4482 insertions, -654 deletions across 72 files
This commit is contained in:
244
database/README.md
Normal file
244
database/README.md
Normal file
@@ -0,0 +1,244 @@
|
||||
# Post-ERG Thesis Database Schema
|
||||
|
||||
SQLite database schema for managing final thesis projects (TFE) and doctoral theses at ERG.
|
||||
|
||||
## Overview
|
||||
|
||||
This schema supports all requirements from the technical specifications (`posterg_fiche-technique.md`):
|
||||
|
||||
- Multiple metadata categories (orientation, AP, finality, languages, formats, keywords)
|
||||
- Multiple authors and supervisors per thesis
|
||||
- Access control (Libre/Interne/Interdit)
|
||||
- Licensing management
|
||||
- File uploads (main TFE, annexes, written parts)
|
||||
- Jury notes and points
|
||||
- Publication workflow (submission → defense → publication)
|
||||
- Editable static pages (charte, about, licenses, contact)
|
||||
- Distinction between TFEs and doctoral theses
|
||||
|
||||
## Database Structure
|
||||
|
||||
### Core Tables
|
||||
|
||||
**`theses`** - Main thesis information
|
||||
- Basic metadata (title, subtitle, year, identifier)
|
||||
- Academic details (orientation, AP program, finality)
|
||||
- Content (synopsis, jury notes, duration/size)
|
||||
- Access control and licensing
|
||||
- Publication workflow status
|
||||
|
||||
**`authors`** - Student/author information
|
||||
- Name and contact email
|
||||
|
||||
**`supervisors`** - Thesis promoters
|
||||
- Name of supervisor/promoter
|
||||
|
||||
**`thesis_files`** - Uploaded files
|
||||
- Main TFE, annexes, written parts
|
||||
- File metadata (path, size, MIME type)
|
||||
|
||||
**`pages`** - Static content pages
|
||||
- Charte, about, licenses, contact pages
|
||||
- Easily editable content
|
||||
|
||||
### Reference Tables (Predefined Lists)
|
||||
|
||||
- `orientations` - Arts Numériques, Dessin, Cinéma d'animation, etc.
|
||||
- `ap_programs` - Narration Spéculative, DPM, APS, LIENS
|
||||
- `finality_types` - Approfondi, Enseignement, Spécialisé
|
||||
- `languages` - Français, Anglais, etc. (expandable)
|
||||
- `format_types` - Site web, Audio, Vidéo, Performance, etc.
|
||||
- `keywords` - Dynamic, expandable keyword list (max 10 per thesis)
|
||||
- `access_types` - Libre, Interne, Interdit
|
||||
- `license_types` - To be defined
|
||||
|
||||
### Junction Tables (Many-to-Many)
|
||||
|
||||
- `thesis_authors` - Links theses to authors
|
||||
- `thesis_supervisors` - Links theses to supervisors
|
||||
- `thesis_languages` - Multiple languages per thesis
|
||||
- `thesis_formats` - Multiple formats per thesis
|
||||
- `thesis_keywords` - Max 10 keywords per thesis
|
||||
|
||||
## Key Features
|
||||
|
||||
### 1. Flexible Metadata
|
||||
- Multiple authors, supervisors, languages, formats, and keywords per thesis
|
||||
- Predefined lists with ability to add new entries
|
||||
- Proper normalization to avoid data duplication
|
||||
|
||||
### 2. Access Control
|
||||
Three levels of access as specified:
|
||||
- **Libre**: Freely accessible online and in library
|
||||
- **Interne**: Physical access only, descriptive note online
|
||||
- **Interdit**: No physical/online access, descriptive note only
|
||||
|
||||
**Important**: Access can be restricted but never opened (as per specs)
|
||||
|
||||
### 3. Publication Workflow
|
||||
The schema tracks the complete lifecycle:
|
||||
|
||||
1. **Submission** (`submitted_at`) - Student submits TFE
|
||||
2. **Defense** (`defense_date`) - Soutenance takes place
|
||||
3. **Jury Review** (`jury_note_added`, `jury_points`, `context_note`)
|
||||
4. **Publication** (`published_at`, `is_published = 1`)
|
||||
|
||||
**Important**: TFEs are NOT published immediately upon submission. They must wait for:
|
||||
- Defense to occur
|
||||
- Jury to add optional context note (max 150 words)
|
||||
- Jury points to be recorded
|
||||
|
||||
### 4. File Management
|
||||
Support for multiple file types per thesis:
|
||||
- Main TFE work
|
||||
- Annexes
|
||||
- Written part
|
||||
- Other supporting files
|
||||
|
||||
### 5. Views for Easy Querying
|
||||
|
||||
**`v_theses_full`** - Complete thesis information with all related data
|
||||
- Joins all tables
|
||||
- Concatenates multiple values (authors, supervisors, keywords, etc.)
|
||||
- Use for backend/admin interfaces
|
||||
|
||||
**`v_theses_public`** - Only published theses
|
||||
- Filtered to `is_published = 1`
|
||||
- Use for public-facing website
|
||||
|
||||
## Usage
|
||||
|
||||
### Initialize Database
|
||||
|
||||
```bash
|
||||
sqlite3 posterg.db < schema.sql
|
||||
```
|
||||
|
||||
### Example Queries
|
||||
|
||||
#### Get all published theses from 2025
|
||||
```sql
|
||||
SELECT * FROM v_theses_public WHERE year = 2025;
|
||||
```
|
||||
|
||||
#### Get theses by orientation
|
||||
```sql
|
||||
SELECT * FROM v_theses_full
|
||||
WHERE orientation = 'Vidéographie';
|
||||
```
|
||||
|
||||
#### Get theses with specific keyword
|
||||
```sql
|
||||
SELECT t.* FROM v_theses_full t
|
||||
JOIN thesis_keywords tk ON t.id = tk.thesis_id
|
||||
JOIN keywords k ON tk.keyword_id = k.id
|
||||
WHERE k.keyword = 'performance';
|
||||
```
|
||||
|
||||
#### Get theses awaiting publication (submitted but not published)
|
||||
```sql
|
||||
SELECT * FROM theses
|
||||
WHERE submitted_at IS NOT NULL
|
||||
AND is_published = 0;
|
||||
```
|
||||
|
||||
#### Update access type (can only restrict, not open)
|
||||
```sql
|
||||
-- Allowed: from Libre to Interne
|
||||
UPDATE theses SET access_type_id = 2 WHERE id = 1;
|
||||
|
||||
-- Not allowed per specs: from Interdit to Libre
|
||||
-- This should be enforced in application logic
|
||||
```
|
||||
|
||||
## Data Import Notes
|
||||
|
||||
Based on `Database_TFE_test.csv`:
|
||||
|
||||
### Current CSV Structure
|
||||
- Identifiant (e.g., "2025-002")
|
||||
- Titre, Sous-titre
|
||||
- Auteur·ice(s) - comma-separated if multiple
|
||||
- Contact - email
|
||||
- Promoteur·ice(s) - comma-separated if multiple
|
||||
- Format - comma-separated if multiple
|
||||
- Année
|
||||
- AP - abbreviation (DPM, LIENS, etc.)
|
||||
- Orientation - abbreviation (SC, VI, CA, etc.)
|
||||
- Finalité
|
||||
- Mots-clés - comma-separated, max 10
|
||||
- Synopsis
|
||||
- Contexte - jury context note
|
||||
- Remarques - internal notes
|
||||
- Langue - language(s)
|
||||
- Autorisation - access type
|
||||
- License - license type
|
||||
- taille - duration/size info
|
||||
- Points sur 20 - jury points
|
||||
- lien BAIU - institutional repository link
|
||||
|
||||
### Import Considerations
|
||||
|
||||
1. **Parse comma-separated values** for:
|
||||
- Authors (split and create entries in `authors` table)
|
||||
- Supervisors (split and create entries in `supervisors` table)
|
||||
- Formats (map to `format_types`)
|
||||
- Keywords (split and create/link in `keywords`)
|
||||
- Languages (split and map to `languages`)
|
||||
|
||||
2. **Map abbreviations**:
|
||||
- Orientations: SC → Sculpture, VI → Vidéographie, CA → Cinéma d'animation, etc.
|
||||
- AP: DPM, LIENS, APS (exact match)
|
||||
|
||||
3. **Handle missing data**:
|
||||
- Some fields in CSV are empty (AP, Orientation for some entries)
|
||||
- Use NULL in database
|
||||
|
||||
4. **Parse duration/size**:
|
||||
- Examples: "128 pages", "78 pages + ?? minutes", "68 minutes"
|
||||
- Extract numeric values for `duration_pages` and `duration_minutes`
|
||||
- Store original string in `file_size_info`
|
||||
|
||||
## Schema Design Decisions
|
||||
|
||||
### Why SQLite?
|
||||
- Self-contained, serverless
|
||||
- Easy to backup (single file)
|
||||
- Good performance for this use case
|
||||
- Simple to integrate with various tools
|
||||
|
||||
### Normalization Level
|
||||
- 3rd Normal Form (3NF) for most tables
|
||||
- Denormalized views for read performance
|
||||
- Balance between flexibility and simplicity
|
||||
|
||||
### Extensibility
|
||||
- New languages can be added via `languages` table
|
||||
- Keywords are dynamic and grow with content
|
||||
- License types can be defined later
|
||||
- Static pages can be added via `pages` table
|
||||
|
||||
### Constraints
|
||||
- CASCADE deletes on junction tables
|
||||
- UNIQUE constraints on lookup table names
|
||||
- NOT NULL on critical fields
|
||||
- Automatic timestamps via triggers
|
||||
|
||||
## Important Business Rules
|
||||
|
||||
1. **No immediate publication**: TFEs must go through defense before publication
|
||||
2. **Access restriction is one-way**: Can restrict but not open access
|
||||
3. **Max 10 keywords** per thesis (enforce in application)
|
||||
4. **Jury context note max 150 words** (enforce in application)
|
||||
5. **Synopsis ~200 words** (guideline, not hard limit)
|
||||
6. **Multiple selections allowed** for: languages, formats, authors, supervisors, keywords
|
||||
7. **Doctoral theses**: Use `is_doctoral = 1` to distinguish from TFEs
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Create import script to load CSV data
|
||||
2. Define license types
|
||||
3. Build backend API for CRUD operations
|
||||
4. Implement authorization checks
|
||||
5. Create admin interface for easy editing
|
||||
6. Build public-facing website using views
|
||||
Reference in New Issue
Block a user