Files
xamxam/database/SETUP.md
Théophile Gervreau-Mercier 467aced734 Restructure repository and implement secure search feature
Phase 1: Consolidate shared infrastructure
- Create shared/ directory for common code
- Consolidate Database.php from front-backend and formulaire into unified shared/Database.php
  - Smart path detection for test.db vs posterg.db
  - Secure search with wildcard escaping and input validation
  - Support both singleton and direct instantiation patterns
  - Full CRUD methods for admin functionality
- Move RateLimit.php to shared/ (30 requests/min)
- Update all require paths across apps to use shared/

Phase 2: Reorganize directory structure
- Rename front-backend/ → apps/public/
- Rename formulaire/ → apps/admin/
- Rename db/ → database/
- Update all file paths for new structure
- Create root .gitignore excluding databases, cache, logs

Implement secure search feature
- Add apps/public/search.php with full-text search across theses
- Search filters: query, year, orientation, AP program, keywords
- Security features:
  - SQL injection prevention (prepared statements)
  - Wildcard injection prevention (escape % and _)
  - Input validation (max 200 chars, year range 1900-2100)
  - Rate limiting (30 req/min per IP)
  - Pagination limited to 100 results/page
  - XSS protection (htmlspecialchars on output)

Add comprehensive test suite
- Create apps/public/tests/ with proper structure
  - tests/Integration/SearchTest.php - 12 search scenarios
  - tests/Security/SecurityTest.php - vulnerability testing
  - tests/Unit/RateLimitTest.php - rate limit behavior
- Create database/fixtures/CreateTestDatabase.php
- Add apps/public/run-tests.php test runner
- All tests passing (4/4 suites)

Update deployment configuration
- Rename justfile 'sync' recipe to 'deploy'
- Create deploy group with separate deploy-public and deploy-admin
- Add test-deploy recipe for test database
- Exclude *.db, tests/, cache/, *.md from production deploy
- Deploy shared/ to both public and admin locations

Stats: +4482 insertions, -654 deletions across 72 files
2026-02-02 18:53:58 +01:00

1376 lines
34 KiB
Markdown

# Post-ERG Thesis Database - Setup Guide
Complete guide for setting up and managing the SQLite database for the Post-ERG thesis archive platform.
---
## Table of Contents
1. [Quick Start](#quick-start)
2. [Prerequisites](#prerequisites)
3. [Database Setup](#database-setup)
4. [Schema Overview](#schema-overview)
5. [Detailed Schema Description](#detailed-schema-description)
6. [Common Operations](#common-operations)
7. [Backup & Maintenance](#backup--maintenance)
8. [Troubleshooting](#troubleshooting)
---
## Quick Start
For the impatient, here's the fastest way to get started:
```bash
# Navigate to the database directory
cd /home/padlock/dev/posterg/db
# Create the database and apply schema
sqlite3 posterg.db < schema.sql
# Verify the database was created
sqlite3 posterg.db "SELECT name FROM sqlite_master WHERE type='table';"
# Check predefined data was loaded
sqlite3 posterg.db "SELECT * FROM orientations;"
```
You now have a fully initialized Post-ERG thesis database!
---
## Prerequisites
### Required Software
- **SQLite 3** (version 3.8.0 or higher recommended)
- Check version: `sqlite3 --version`
- Install on Linux: `sudo apt-get install sqlite3`
- Install on macOS: `brew install sqlite3` (usually pre-installed)
- Install on Windows: Download from [sqlite.org/download.html](https://sqlite.org/download.html)
### Optional Tools
- **DB Browser for SQLite** - GUI tool for database management
- Download: [sqlitebrowser.org](https://sqlitebrowser.org/)
- Great for visual exploration and testing
- **sqlite-web** - Web-based SQLite database browser
```bash
pip install sqlite-web
sqlite_web posterg.db
```
---
## Database Setup
### Step 1: Project Structure
Ensure your directory structure looks like this:
```
/home/padlock/dev/posterg/db/
├── schema.sql # Database schema definition
├── Database_TFE_test.csv # Sample/test CSV data
├── posterg_fiche-technique.md # Technical specifications
├── SETUP.md # This file
├── README.md # Schema documentation
└── posterg.db # Database file (created in next step)
```
### Step 2: Create the Database
Create an empty SQLite database and apply the schema:
```bash
# Method 1: Using shell redirection (recommended)
sqlite3 posterg.db < schema.sql
# Method 2: Interactive mode
sqlite3 posterg.db
sqlite> .read schema.sql
sqlite> .quit
# Method 3: One-liner
cat schema.sql | sqlite3 posterg.db
```
### Step 3: Verify Installation
Check that all tables were created successfully:
```bash
sqlite3 posterg.db <<EOF
-- List all tables
.tables
-- Count tables (should be ~20)
SELECT COUNT(*) as table_count
FROM sqlite_master
WHERE type='table';
-- Show schema for main theses table
.schema theses
-- Verify predefined data
SELECT COUNT(*) FROM orientations; -- Should return 15
SELECT COUNT(*) FROM ap_programs; -- Should return 4
SELECT COUNT(*) FROM finality_types; -- Should return 3
SELECT COUNT(*) FROM access_types; -- Should return 3
SELECT COUNT(*) FROM pages; -- Should return 4
EOF
```
Expected output:
- 15 orientations
- 4 AP programs
- 3 finality types
- 3 access types
- 4 static pages (charte, about, licenses, contact)
### Step 4: Test the Database
Run a simple test query:
```bash
sqlite3 posterg.db "SELECT name FROM orientations LIMIT 5;"
```
Should output:
```
Arts Numériques
Dessin
Cinéma d'animation
Installation-Performance
Peinture
```
---
## Schema Overview
### Database Architecture
The Post-ERG database uses a **relational model** with proper normalization (3NF) to ensure:
- Data integrity
- No redundancy
- Flexible querying
- Easy maintenance
### Entity-Relationship Diagram (Conceptual)
```
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Authors │───────│Thesis Authors│───────│ Theses │
└─────────────┘ 1:N └──────────────┘ N:1 └──────────────┘
│ N:1
┌─────────────┐ ┌──────────────┐ │
│ Supervisors │───────│Thesis Supvrs │──────────────┤
└─────────────┘ 1:N └──────────────┘ │
┌─────────────┐ ┌──────────────┐ │
│ Keywords │───────│Thesis Keywrds│──────────────┤
└─────────────┘ 1:N └──────────────┘ │
┌─────────────┐ ┌──────────────┐ │
│ Languages │───────│Thesis Langs │──────────────┤
└─────────────┘ 1:N └──────────────┘ │
┌─────────────┐ ┌──────────────┐ │
│Format Types │───────│Thesis Formats│──────────────┤
└─────────────┘ 1:N └──────────────┘ │
┌─────────────┐ │
│Orientations │─────────────────────────────────────┤
└─────────────┘ N:1 │
┌─────────────┐ │
│ AP Programs │─────────────────────────────────────┤
└─────────────┘ N:1 │
┌─────────────┐ │
│Finality Type│─────────────────────────────────────┤
└─────────────┘ N:1 │
┌─────────────┐ │
│Access Types │─────────────────────────────────────┤
└─────────────┘ N:1 │
┌─────────────┐ │
│License Types│─────────────────────────────────────┤
└─────────────┘ N:1 │
┌─────────────┐ │
│Thesis Files │─────────────────────────────────────┘
└─────────────┘ 1:N
```
### Table Categories
**1. Core Tables** (7 tables)
- `theses` - Main thesis records
- `authors` - Student/author information
- `supervisors` - Thesis promoters
- `thesis_files` - File uploads
- `pages` - Static content pages
**2. Reference/Lookup Tables** (7 tables)
- `orientations` - Academic orientations
- `ap_programs` - AP programs (ateliers)
- `finality_types` - Master finality
- `languages` - Thesis languages
- `format_types` - Work formats
- `keywords` - Thesis keywords
- `access_types` - Access levels
- `license_types` - License options
**3. Junction Tables** (5 tables)
- `thesis_authors` - Many-to-many: theses ↔ authors
- `thesis_supervisors` - Many-to-many: theses ↔ supervisors
- `thesis_languages` - Many-to-many: theses ↔ languages
- `thesis_formats` - Many-to-many: theses ↔ formats
- `thesis_keywords` - Many-to-many: theses ↔ keywords
**4. Views** (2 views)
- `v_theses_full` - Complete thesis data (admin view)
- `v_theses_public` - Published theses only (public view)
---
## Detailed Schema Description
### Core Tables
#### `theses` - The Heart of the Database
The central table storing all thesis metadata and state information.
**Purpose**: Store all thesis projects (TFE and doctoral theses) with complete metadata, publication workflow state, and access control.
**Columns**:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER PK | Unique identifier (auto-increment) |
| `identifier` | TEXT UNIQUE | Human-readable ID (e.g., "2025-002") |
| **Basic Information** |||
| `title` | TEXT NOT NULL | Thesis title |
| `subtitle` | TEXT | Optional subtitle |
| `year` | INTEGER NOT NULL | Year of submission/defense |
| `is_doctoral` | BOOLEAN | 0=TFE, 1=Doctoral thesis |
| **Academic Details** |||
| `orientation_id` | INTEGER FK | Links to `orientations` table |
| `ap_program_id` | INTEGER FK | Links to `ap_programs` table |
| `finality_id` | INTEGER FK | Links to `finality_types` table |
| **Content** |||
| `synopsis` | TEXT | ~200 word description |
| `context_note` | TEXT | Jury president note (max 150 words) |
| `remarks` | TEXT | Internal remarks/notes |
| **Duration/Size** |||
| `duration_minutes` | INTEGER | For audio/video works |
| `duration_pages` | INTEGER | For written works |
| `file_size_info` | TEXT | Free-form size description |
| **Access & Licensing** |||
| `access_type_id` | INTEGER FK | Libre/Interne/Interdit |
| `license_id` | INTEGER FK | Links to `license_types` |
| **Jury Information** |||
| `jury_points` | DECIMAL(4,2) | Points out of 20 (e.g., 15.50) |
| `jury_note_added` | BOOLEAN | Whether jury added context note |
| **Publication Workflow** |||
| `submitted_at` | DATETIME | When student submitted |
| `defense_date` | DATETIME | Date of soutenance |
| `published_at` | DATETIME | When made public |
| `is_published` | BOOLEAN | Publication status flag |
| **External Links** |||
| `baiu_link` | TEXT | Link to BAIU repository |
| **Timestamps** |||
| `created_at` | DATETIME | Record creation (auto) |
| `updated_at` | DATETIME | Last modification (auto-updated) |
**Indexes**:
- `idx_theses_year` - Fast filtering by year
- `idx_theses_published` - Quick access to published theses
- `idx_theses_identifier` - Fast lookup by identifier
- `idx_theses_orientation` - Filter by orientation
- `idx_theses_ap_program` - Filter by AP program
- `idx_theses_access_type` - Filter by access level
**Business Logic**:
1. **Publication Workflow**:
```
Student submits → submitted_at set
Defense happens → defense_date set
Jury reviews → jury_points + context_note added
Publication → published_at set, is_published = 1
```
2. **Access Control**:
- Libre: Full access everywhere
- Interne: Physical only, note online
- Interdit: No access, note online only
- **Important**: Can only restrict, never open
#### `authors` - Student Information
**Purpose**: Store unique author/student records to avoid duplication.
**Columns**:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER PK | Unique author ID |
| `name` | TEXT NOT NULL | Full name |
| `email` | TEXT | Contact email (optional) |
| `created_at` | DATETIME | When added |
| `updated_at` | DATETIME | Last modified |
**Indexed**: `email` for fast lookup
**Relationships**: One author can have multiple theses (via `thesis_authors`)
#### `supervisors` - Thesis Promoters
**Purpose**: Store unique supervisor/promoter records.
**Columns**:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER PK | Unique supervisor ID |
| `name` | TEXT NOT NULL | Full name |
| `created_at` | DATETIME | When added |
| `updated_at` | DATETIME | Last modified |
**Relationships**: One supervisor can supervise multiple theses (via `thesis_supervisors`)
#### `thesis_files` - File Attachments
**Purpose**: Track all uploaded files associated with a thesis.
**Columns**:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER PK | Unique file ID |
| `thesis_id` | INTEGER FK | Links to `theses` |
| `file_type` | TEXT | 'main', 'annex', 'written_part', 'other' |
| `file_path` | TEXT NOT NULL | Server path to file |
| `file_name` | TEXT NOT NULL | Original filename |
| `file_size` | INTEGER | Size in bytes |
| `mime_type` | TEXT | MIME type (e.g., 'application/pdf') |
| `description` | TEXT | Optional file description |
| `uploaded_at` | DATETIME | Upload timestamp |
**File Types**:
- `main` - The primary TFE work
- `annex` - Supporting materials
- `written_part` - Written thesis component
- `other` - Miscellaneous files
**Cascade Delete**: When a thesis is deleted, all its files are automatically deleted from the database record (you'll need to handle actual file deletion separately).
#### `pages` - Static Content Management
**Purpose**: Editable static pages for the website (charte, about, etc.).
**Columns**:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER PK | Unique page ID |
| `slug` | TEXT UNIQUE | URL-friendly identifier |
| `title` | TEXT NOT NULL | Page title |
| `content` | TEXT | Markdown/HTML content |
| `is_published` | BOOLEAN | Visibility flag |
| `created_at` | DATETIME | Creation timestamp |
| `updated_at` | DATETIME | Last modification |
**Pre-loaded Pages**:
- `charte` - Charter/guidelines
- `about` - About the project
- `licenses` - License information
- `contact` - Contact information
**Usage**: Allows non-technical users to edit important static content without touching code.
---
### Reference Tables (Predefined Lists)
#### `orientations` - Academic Orientations
**Purpose**: Predefined list of artistic/academic orientations at ERG.
**Pre-loaded Values** (15 total):
1. Arts Numériques
2. Dessin
3. Cinéma d'animation
4. Installation-Performance
5. Peinture
6. Photographie
7. Sculpture
8. Vidéographie
9. Graphisme
10. Typographie
11. Design Numérique
12. Illustration
13. Bande-Dessinée
14. Sérigraphie
15. Gravure
**Schema**:
```sql
CREATE TABLE orientations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Usage**: Each thesis links to one orientation via `theses.orientation_id`.
#### `ap_programs` - Atelier Pratiques (AP)
**Purpose**: Predefined list of AP programs.
**Pre-loaded Values** (4 total):
1. Narration Spéculative
2. Design et Politique du Multiple (DPM)
3. Atelier Pratiques Situées (APS)
4. Lieux, Interdisciplinarités, Écologie, Nécessité, Systèmes (LIENS)
**Schema**:
```sql
CREATE TABLE ap_programs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
code TEXT, -- e.g., 'DPM', 'LIENS'
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Usage**: Each thesis can optionally link to one AP program.
#### `finality_types` - Master Finality
**Purpose**: Type of master's degree finality.
**Pre-loaded Values** (3 total):
1. Approfondi
2. Enseignement
3. Spécialisé
**Usage**: Each thesis links to one finality type.
#### `languages` - Thesis Languages
**Purpose**: Languages in which theses are written.
**Pre-loaded Values**:
- Français
- Anglais
**Expandable**: New languages can be added as needed.
**Many-to-Many**: A thesis can be multilingual (via `thesis_languages`).
#### `format_types` - Work Formats
**Purpose**: Physical/digital format of the thesis work.
**Pre-loaded Values** (7 total):
1. Site web
2. Audio
3. Vidéo
4. Performance
5. Objet éditorial
6. Installation
7. Autre
**Many-to-Many**: A thesis can have multiple formats (e.g., "Vidéo + Objet éditorial").
#### `keywords` - Thesis Keywords
**Purpose**: Dynamic, expandable keyword system for categorization.
**Schema**:
```sql
CREATE TABLE keywords (
id INTEGER PRIMARY KEY AUTOINCREMENT,
keyword TEXT NOT NULL UNIQUE,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Characteristics**:
- Starts empty, grows organically
- No predefined list
- Each keyword is unique across the database
- Max 10 keywords per thesis (enforced in application)
**Many-to-Many**: Via `thesis_keywords` junction table.
#### `access_types` - Access Permissions
**Purpose**: Define how theses can be accessed.
**Pre-loaded Values** (3 types):
| Name | Description |
|------|-------------|
| Libre | Freely accessible online and in physical library |
| Interne | Physical access only; descriptive note online |
| Interdit | No access; descriptive note online only |
**Important Business Rule**: Access can be **restricted** but never **opened**.
- ✅ Allowed: Libre → Interne → Interdit
- ❌ Not allowed: Interdit → Interne or Libre
This must be enforced in application logic.
#### `license_types` - Licensing Options
**Purpose**: Legal licensing information for theses.
**Schema**:
```sql
CREATE TABLE license_types (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
description TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
**Status**: To be populated later (options still being determined as per specs).
**Potential Values** (examples):
- CC BY 4.0
- CC BY-SA 4.0
- CC BY-NC 4.0
- All Rights Reserved
- Custom License
---
### Junction Tables (Many-to-Many Relationships)
Junction tables enable many-to-many relationships between entities.
#### `thesis_authors` - Thesis ↔ Authors
**Purpose**: Link theses to their authors (can have multiple authors).
**Schema**:
```sql
CREATE TABLE thesis_authors (
thesis_id INTEGER NOT NULL,
author_id INTEGER NOT NULL,
author_order INTEGER DEFAULT 1, -- First author, second author, etc.
PRIMARY KEY (thesis_id, author_id),
FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE,
FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE CASCADE
);
```
**Composite Primary Key**: `(thesis_id, author_id)` ensures no duplicate pairings.
**Ordering**: `author_order` preserves author sequence for citation purposes.
**Example**:
```sql
-- Thesis with 2 authors
INSERT INTO thesis_authors (thesis_id, author_id, author_order) VALUES
(1, 5, 1), -- First author
(1, 8, 2); -- Second author
```
#### `thesis_supervisors` - Thesis ↔ Supervisors
**Purpose**: Link theses to their supervisors/promoters (can have multiple).
**Schema**: Similar to `thesis_authors`, includes `supervisor_order`.
**Example**:
```sql
-- Thesis with co-promoters
INSERT INTO thesis_supervisors (thesis_id, supervisor_id, supervisor_order) VALUES
(1, 3, 1), -- Primary promoter
(1, 7, 2); -- Co-promoter
```
#### `thesis_languages` - Thesis ↔ Languages
**Purpose**: Support multilingual theses.
**Schema**:
```sql
CREATE TABLE thesis_languages (
thesis_id INTEGER NOT NULL,
language_id INTEGER NOT NULL,
PRIMARY KEY (thesis_id, language_id),
FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE,
FOREIGN KEY (language_id) REFERENCES languages(id) ON DELETE CASCADE
);
```
**Example**:
```sql
-- Bilingual thesis (French + English)
INSERT INTO thesis_languages (thesis_id, language_id) VALUES
(1, 1), -- French
(1, 2); -- English
```
#### `thesis_formats` - Thesis ↔ Formats
**Purpose**: Support multi-format works.
**Example Use Case**: A thesis that is both a video and has an editorial object (book).
```sql
INSERT INTO thesis_formats (thesis_id, format_id) VALUES
(10, 3), -- Video
(10, 5); -- Objet éditorial
```
#### `thesis_keywords` - Thesis ↔ Keywords
**Purpose**: Tag theses with up to 10 keywords for discovery.
**Business Rule**: Maximum 10 keywords per thesis (enforce in application).
**Example**:
```sql
-- Add keywords to a thesis
INSERT INTO thesis_keywords (thesis_id, keyword_id) VALUES
(1, 15), -- "performance"
(1, 22), -- "urbanisme"
(1, 8); -- "sociologie"
```
**Indexed** for fast searching:
- `idx_thesis_keywords_thesis` - Find all keywords for a thesis
- `idx_thesis_keywords_keyword` - Find all theses for a keyword
---
### Views (Simplified Querying)
Views are pre-written queries that act like virtual tables.
#### `v_theses_full` - Complete Thesis Data
**Purpose**: Administrative view with all thesis information in one query.
**What it does**:
- Joins all related tables
- Concatenates multiple values (authors, supervisors, keywords, etc.)
- Displays human-readable names instead of IDs
**Columns**: All thesis metadata plus:
- `authors` - Comma-separated author names
- `supervisors` - Comma-separated supervisor names
- `languages` - Comma-separated language names
- `formats` - Comma-separated format types
- `keywords` - Comma-separated keywords
- Plus all human-readable names (orientation, AP, finality, etc.)
**Usage**:
```sql
-- Get complete info for thesis #5
SELECT * FROM v_theses_full WHERE id = 5;
-- All theses from 2025 in Vidéographie
SELECT * FROM v_theses_full
WHERE year = 2025 AND orientation = 'Vidéographie';
```
**Performance Note**: This is a complex join. Use for admin interfaces, not high-traffic public pages.
#### `v_theses_public` - Published Theses Only
**Purpose**: Public-facing view showing only published theses.
**What it does**:
- Same as `v_theses_full`
- But filtered to `is_published = 1`
**Usage**:
```sql
-- Safe for public website
SELECT * FROM v_theses_public
WHERE year = 2025
ORDER BY title;
```
**Security**: Ensures unpublished theses are never exposed to public.
---
### Automatic Features
#### Auto-Incrementing IDs
All primary keys use `AUTOINCREMENT`:
```sql
id INTEGER PRIMARY KEY AUTOINCREMENT
```
**Benefit**: You never need to specify IDs manually. SQLite handles it.
**Example**:
```sql
-- SQLite automatically assigns id = 1, 2, 3, etc.
INSERT INTO authors (name, email) VALUES ('Alice Néron', 'alice@example.com');
```
#### Automatic Timestamps
**Creation Timestamps**:
```sql
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
```
Automatically set when a record is inserted.
**Update Timestamps**:
Triggers automatically update `updated_at` when records change:
```sql
CREATE TRIGGER update_theses_timestamp
AFTER UPDATE ON theses
BEGIN
UPDATE theses SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
END;
```
**Benefit**: Full audit trail without manual date management.
**Example**:
```sql
-- created_at is set automatically
INSERT INTO authors (name) VALUES ('Bob Smith');
-- updated_at is set automatically on update
UPDATE authors SET email = 'bob@newmail.com' WHERE id = 1;
```
#### Cascade Deletes
When you delete a thesis, all related records are automatically removed:
```sql
FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE
```
**Affected Tables**:
- `thesis_authors`
- `thesis_supervisors`
- `thesis_languages`
- `thesis_formats`
- `thesis_keywords`
- `thesis_files`
**Example**:
```sql
-- This also deletes all associated authors, keywords, files, etc.
DELETE FROM theses WHERE id = 10;
```
**Warning**: This is permanent and cannot be undone!
---
## Common Operations
### Querying
#### Basic Queries
```bash
# Enter SQLite shell
sqlite3 posterg.db
# List all tables
.tables
# Show table structure
.schema theses
# Pretty output
.mode column
.headers on
# Run a query
SELECT * FROM orientations;
# Exit
.quit
```
#### Find Published Theses
```sql
SELECT title, year, authors, orientation
FROM v_theses_public
WHERE year >= 2024
ORDER BY year DESC, title;
```
#### Search by Keyword
```sql
SELECT t.title, t.year, GROUP_CONCAT(k.keyword) as keywords
FROM theses t
JOIN thesis_keywords tk ON t.id = tk.thesis_id
JOIN keywords k ON tk.keyword_id = k.id
WHERE k.keyword LIKE '%performance%'
GROUP BY t.id;
```
#### Find Theses by Author
```sql
SELECT t.title, t.year, a.name as author
FROM theses t
JOIN thesis_authors ta ON t.id = ta.thesis_id
JOIN authors a ON ta.author_id = a.id
WHERE a.name LIKE '%Lucie%'
ORDER BY t.year DESC;
```
#### Get Unpublished Theses (Admin)
```sql
SELECT identifier, title, submitted_at, defense_date
FROM theses
WHERE submitted_at IS NOT NULL
AND is_published = 0
ORDER BY submitted_at DESC;
```
### Inserting Data
#### Add a New Author
```sql
INSERT INTO authors (name, email) VALUES
('Marie Dupont', 'marie.dupont@example.com');
```
#### Add a New Thesis (Basic)
```sql
INSERT INTO theses (
identifier, title, year, orientation_id, finality_id, synopsis
) VALUES (
'2026-001',
'Mon Titre de TFE',
2026,
8, -- Vidéographie
1, -- Approfondi
'Un synopsis fascinant de mon travail...'
);
```
#### Link Thesis to Author
```sql
-- Get thesis ID and author ID first
INSERT INTO thesis_authors (thesis_id, author_id, author_order)
VALUES (1, 5, 1);
```
#### Add Keywords to Thesis
```sql
-- First, ensure keyword exists
INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance');
-- Then link it
INSERT INTO thesis_keywords (thesis_id, keyword_id)
SELECT 1, id FROM keywords WHERE keyword = 'performance';
```
### Updating Data
#### Update Thesis Status to Published
```sql
UPDATE theses
SET is_published = 1,
published_at = CURRENT_TIMESTAMP
WHERE id = 5;
```
#### Add Jury Points and Note
```sql
UPDATE theses
SET jury_points = 16.5,
context_note = 'Ce travail remarquable explore...',
jury_note_added = 1
WHERE id = 5;
```
#### Restrict Access (Libre → Interne)
```sql
UPDATE theses
SET access_type_id = (SELECT id FROM access_types WHERE name = 'Interne')
WHERE id = 10;
```
#### Update Page Content
```sql
UPDATE pages
SET content = 'Nouveau contenu de la page...',
updated_at = CURRENT_TIMESTAMP
WHERE slug = 'about';
```
### Deleting Data
**Warning**: Deletes are permanent in SQLite!
#### Delete a Thesis (and all related data)
```sql
-- This cascades to thesis_authors, thesis_keywords, etc.
DELETE FROM theses WHERE id = 10;
```
#### Remove Keyword from Thesis
```sql
DELETE FROM thesis_keywords
WHERE thesis_id = 5 AND keyword_id = 12;
```
#### Delete Unused Keywords
```sql
-- Remove keywords not linked to any thesis
DELETE FROM keywords
WHERE id NOT IN (SELECT DISTINCT keyword_id FROM thesis_keywords);
```
---
## Backup & Maintenance
### Backup Strategies
#### Method 1: File Copy (Simplest)
```bash
# Copy the database file
cp posterg.db posterg_backup_$(date +%Y%m%d).db
# Or with compression
tar -czf posterg_backup_$(date +%Y%m%d).tar.gz posterg.db
```
#### Method 2: SQL Dump (Most Portable)
```bash
# Export entire database to SQL
sqlite3 posterg.db .dump > posterg_backup.sql
# Restore from backup
sqlite3 new_posterg.db < posterg_backup.sql
```
#### Method 3: Automated Backups
Create a backup script (`backup.sh`):
```bash
#!/bin/bash
BACKUP_DIR="/home/padlock/dev/posterg/db/backups"
DATE=$(date +%Y%m%d_%H%M%S)
DB_FILE="/home/padlock/dev/posterg/db/posterg.db"
mkdir -p "$BACKUP_DIR"
sqlite3 "$DB_FILE" ".backup '$BACKUP_DIR/posterg_$DATE.db'"
echo "Backup created: $BACKUP_DIR/posterg_$DATE.db"
# Keep only last 30 backups
ls -t "$BACKUP_DIR"/posterg_*.db | tail -n +31 | xargs rm -f
```
Run daily with cron:
```bash
# Edit crontab
crontab -e
# Add daily backup at 2am
0 2 * * * /home/padlock/dev/posterg/db/backup.sh
```
### Database Maintenance
#### Optimize Database (Vacuum)
Reclaim unused space and optimize performance:
```bash
sqlite3 posterg.db "VACUUM;"
```
**When to run**: After large deletions or monthly.
#### Analyze Database
Update query optimizer statistics:
```bash
sqlite3 posterg.db "ANALYZE;"
```
**When to run**: After significant data changes.
#### Check Integrity
Verify database integrity:
```bash
sqlite3 posterg.db "PRAGMA integrity_check;"
```
**Expected output**: `ok`
#### Database Statistics
```sql
-- Database size
SELECT page_count * page_size / 1024 / 1024.0 AS size_mb
FROM pragma_page_count(), pragma_page_size();
-- Row counts
SELECT 'theses' as table_name, COUNT(*) as rows FROM theses
UNION ALL
SELECT 'authors', COUNT(*) FROM authors
UNION ALL
SELECT 'keywords', COUNT(*) FROM keywords;
-- Index usage
SELECT name, tbl_name FROM sqlite_master
WHERE type = 'index'
ORDER BY tbl_name;
```
### Migration Best Practices
When updating the schema:
1. **Always backup first**:
```bash
cp posterg.db posterg_before_migration.db
```
2. **Test migration on backup**:
```bash
sqlite3 posterg_test.db < migration.sql
```
3. **Use transactions**:
```sql
BEGIN TRANSACTION;
-- Your changes here
ALTER TABLE theses ADD COLUMN new_field TEXT;
-- Test queries
SELECT * FROM theses LIMIT 1;
COMMIT; -- or ROLLBACK if something went wrong
```
4. **Document changes**:
Create migration files like `migrations/001_add_new_field.sql`
---
## Troubleshooting
### Common Issues
#### Database is Locked
**Symptom**: `Error: database is locked`
**Cause**: Another process has the database open for writing.
**Solution**:
```bash
# Find processes using the database
lsof posterg.db
# Or force close
fuser -k posterg.db
# Prevent by using WAL mode
sqlite3 posterg.db "PRAGMA journal_mode=WAL;"
```
#### Foreign Key Violations
**Symptom**: `FOREIGN KEY constraint failed`
**Cause**: Trying to insert a reference to a non-existent record.
**Solution**:
```sql
-- Enable foreign key enforcement (check if it's on)
PRAGMA foreign_keys = ON;
-- Verify referenced record exists
SELECT id FROM orientations WHERE id = 8;
```
#### Unique Constraint Violation
**Symptom**: `UNIQUE constraint failed`
**Solution**:
```sql
-- Use INSERT OR IGNORE to skip duplicates
INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance');
-- Or INSERT OR REPLACE to update
INSERT OR REPLACE INTO keywords (id, keyword) VALUES (1, 'performance');
```
#### Cannot Find Database File
**Symptom**: `Error: unable to open database file`
**Solution**:
```bash
# Use absolute path
sqlite3 /home/padlock/dev/posterg/db/posterg.db
# Or navigate to directory first
cd /home/padlock/dev/posterg/db
sqlite3 posterg.db
```
### Performance Issues
#### Slow Queries
**Diagnosis**:
```sql
-- Enable query timer
.timer on
-- Explain query plan
EXPLAIN QUERY PLAN
SELECT * FROM theses WHERE year = 2025;
```
**Solutions**:
- Add indexes on frequently queried columns
- Use views for complex queries
- Run `ANALYZE;` to update statistics
#### Large Database
**Solutions**:
```bash
# Compress old data
sqlite3 posterg.db "VACUUM;"
# Use WAL mode for better concurrency
sqlite3 posterg.db "PRAGMA journal_mode=WAL;"
# Archive old theses to separate database
```
### Data Quality Issues
#### Find Orphaned Records
```sql
-- Authors with no theses
SELECT a.* FROM authors a
LEFT JOIN thesis_authors ta ON a.id = ta.author_id
WHERE ta.author_id IS NULL;
-- Theses missing required fields
SELECT id, identifier, title FROM theses
WHERE orientation_id IS NULL OR finality_id IS NULL;
```
#### Validate Keyword Count
```sql
-- Theses with more than 10 keywords
SELECT thesis_id, COUNT(*) as keyword_count
FROM thesis_keywords
GROUP BY thesis_id
HAVING keyword_count > 10;
```
### Recovery Procedures
#### Restore from Backup
```bash
# From SQL dump
sqlite3 posterg_restored.db < posterg_backup.sql
# From database file
cp posterg_backup_20260127.db posterg.db
```
#### Corrupted Database
```bash
# Try to recover
sqlite3 posterg.db ".recover" | sqlite3 recovered.db
# Or dump and reimport
sqlite3 posterg.db .dump | sqlite3 new_posterg.db
```
---
## Advanced Tips
### Performance Optimization
```sql
-- Enable Write-Ahead Logging (WAL) for better concurrency
PRAGMA journal_mode=WAL;
-- Increase cache size (in KB)
PRAGMA cache_size=-64000; -- 64MB cache
-- Enable memory-mapped I/O (in bytes)
PRAGMA mmap_size=268435456; -- 256MB
-- Synchronous mode (less safe but faster)
PRAGMA synchronous=NORMAL; -- Default is FULL
```
### Useful SQLite Commands
```sql
-- Export table to CSV
.mode csv
.output theses.csv
SELECT * FROM v_theses_public;
.output stdout
-- Import CSV
.mode csv
.import data.csv table_name
-- Show execution time
.timer on
-- Show query plan
.eqp on
-- Pretty formatting
.mode column
.headers on
.width 10 40 20
-- Save frequently used queries
.save my_queries.sql
```
### Custom Functions (Application Level)
When building your application, you can create custom SQLite functions:
**Python example**:
```python
import sqlite3
def keyword_count(thesis_id):
"""Custom function to count keywords"""
# Implementation
pass
conn = sqlite3.connect('posterg.db')
conn.create_function('keyword_count', 1, keyword_count)
```
---
## Next Steps
After setting up the database:
1. **Import existing data** from `Database_TFE_test.csv`
- Create import script (Python/Node.js recommended)
- Parse CSV and map to schema
- Handle comma-separated values
- Validate data quality
2. **Define license types**
- Consult with legal/admin
- Populate `license_types` table
3. **Build application layer**
- REST API or GraphQL
- Authentication/authorization
- File upload handling
- Email notifications
4. **Create admin interface**
- CRUD operations for all entities
- Bulk import/export
- User management
- Workflow management
5. **Build public website**
- Search and filter
- Thesis display
- Respect access controls
- Static pages management
---
## Resources
### SQLite Documentation
- Official docs: https://sqlite.org/docs.html
- SQL syntax: https://sqlite.org/lang.html
- Datatypes: https://sqlite.org/datatype3.html
### Tools
- DB Browser: https://sqlitebrowser.org/
- sqlite-web: https://github.com/coleifer/sqlite-web
- SQLite CLI: https://sqlite.org/cli.html
### Best Practices
- Always use transactions for multiple operations
- Enable foreign keys: `PRAGMA foreign_keys = ON;`
- Backup before schema changes
- Use prepared statements in applications
- Index frequently queried columns
---
## Support
For issues related to:
- **Schema design**: Review this document and README.md
- **Data import**: Check CSV format and data types
- **Performance**: Run `ANALYZE` and check indexes
- **Corruption**: Restore from backup
---
**Last Updated**: 2026-01-27
**Schema Version**: 1.0
**Database**: SQLite 3