# Post-ERG Thesis Database - Setup Guide Complete guide for setting up and managing the SQLite database for the Post-ERG thesis archive platform. --- ## Table of Contents 1. [Quick Start](#quick-start) 2. [Prerequisites](#prerequisites) 3. [Database Setup](#database-setup) 4. [Schema Overview](#schema-overview) 5. [Detailed Schema Description](#detailed-schema-description) 6. [Common Operations](#common-operations) 7. [Backup & Maintenance](#backup--maintenance) 8. [Troubleshooting](#troubleshooting) --- ## Quick Start For the impatient, here's the fastest way to get started: ```bash # Navigate to the database directory cd /home/padlock/dev/posterg/db # Create the database and apply schema sqlite3 posterg.db < schema.sql # Verify the database was created sqlite3 posterg.db "SELECT name FROM sqlite_master WHERE type='table';" # Check predefined data was loaded sqlite3 posterg.db "SELECT * FROM orientations;" ``` You now have a fully initialized Post-ERG thesis database! --- ## Prerequisites ### Required Software - **SQLite 3** (version 3.8.0 or higher recommended) - Check version: `sqlite3 --version` - Install on Linux: `sudo apt-get install sqlite3` - Install on macOS: `brew install sqlite3` (usually pre-installed) - Install on Windows: Download from [sqlite.org/download.html](https://sqlite.org/download.html) ### Optional Tools - **DB Browser for SQLite** - GUI tool for database management - Download: [sqlitebrowser.org](https://sqlitebrowser.org/) - Great for visual exploration and testing - **sqlite-web** - Web-based SQLite database browser ```bash pip install sqlite-web sqlite_web posterg.db ``` --- ## Database Setup ### Step 1: Project Structure Ensure your directory structure looks like this: ``` /home/padlock/dev/posterg/db/ ├── schema.sql # Database schema definition ├── Database_TFE_test.csv # Sample/test CSV data ├── posterg_fiche-technique.md # Technical specifications ├── SETUP.md # This file ├── README.md # Schema documentation └── posterg.db # Database file (created in next step) ``` ### Step 2: Create the Database Create an empty SQLite database and apply the schema: ```bash # Method 1: Using shell redirection (recommended) sqlite3 posterg.db < schema.sql # Method 2: Interactive mode sqlite3 posterg.db sqlite> .read schema.sql sqlite> .quit # Method 3: One-liner cat schema.sql | sqlite3 posterg.db ``` ### Step 3: Verify Installation Check that all tables were created successfully: ```bash sqlite3 posterg.db <= 2024 ORDER BY year DESC, title; ``` #### Search by Keyword ```sql SELECT t.title, t.year, GROUP_CONCAT(k.keyword) as keywords FROM theses t JOIN thesis_keywords tk ON t.id = tk.thesis_id JOIN keywords k ON tk.keyword_id = k.id WHERE k.keyword LIKE '%performance%' GROUP BY t.id; ``` #### Find Theses by Author ```sql SELECT t.title, t.year, a.name as author FROM theses t JOIN thesis_authors ta ON t.id = ta.thesis_id JOIN authors a ON ta.author_id = a.id WHERE a.name LIKE '%Lucie%' ORDER BY t.year DESC; ``` #### Get Unpublished Theses (Admin) ```sql SELECT identifier, title, submitted_at, defense_date FROM theses WHERE submitted_at IS NOT NULL AND is_published = 0 ORDER BY submitted_at DESC; ``` ### Inserting Data #### Add a New Author ```sql INSERT INTO authors (name, email) VALUES ('Marie Dupont', 'marie.dupont@example.com'); ``` #### Add a New Thesis (Basic) ```sql INSERT INTO theses ( identifier, title, year, orientation_id, finality_id, synopsis ) VALUES ( '2026-001', 'Mon Titre de TFE', 2026, 8, -- Vidéographie 1, -- Approfondi 'Un synopsis fascinant de mon travail...' ); ``` #### Link Thesis to Author ```sql -- Get thesis ID and author ID first INSERT INTO thesis_authors (thesis_id, author_id, author_order) VALUES (1, 5, 1); ``` #### Add Keywords to Thesis ```sql -- First, ensure keyword exists INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance'); -- Then link it INSERT INTO thesis_keywords (thesis_id, keyword_id) SELECT 1, id FROM keywords WHERE keyword = 'performance'; ``` ### Updating Data #### Update Thesis Status to Published ```sql UPDATE theses SET is_published = 1, published_at = CURRENT_TIMESTAMP WHERE id = 5; ``` #### Add Jury Points and Note ```sql UPDATE theses SET jury_points = 16.5, context_note = 'Ce travail remarquable explore...', jury_note_added = 1 WHERE id = 5; ``` #### Restrict Access (Libre → Interne) ```sql UPDATE theses SET access_type_id = (SELECT id FROM access_types WHERE name = 'Interne') WHERE id = 10; ``` #### Update Page Content ```sql UPDATE pages SET content = 'Nouveau contenu de la page...', updated_at = CURRENT_TIMESTAMP WHERE slug = 'about'; ``` ### Deleting Data **Warning**: Deletes are permanent in SQLite! #### Delete a Thesis (and all related data) ```sql -- This cascades to thesis_authors, thesis_keywords, etc. DELETE FROM theses WHERE id = 10; ``` #### Remove Keyword from Thesis ```sql DELETE FROM thesis_keywords WHERE thesis_id = 5 AND keyword_id = 12; ``` #### Delete Unused Keywords ```sql -- Remove keywords not linked to any thesis DELETE FROM keywords WHERE id NOT IN (SELECT DISTINCT keyword_id FROM thesis_keywords); ``` --- ## Backup & Maintenance ### Backup Strategies #### Method 1: File Copy (Simplest) ```bash # Copy the database file cp posterg.db posterg_backup_$(date +%Y%m%d).db # Or with compression tar -czf posterg_backup_$(date +%Y%m%d).tar.gz posterg.db ``` #### Method 2: SQL Dump (Most Portable) ```bash # Export entire database to SQL sqlite3 posterg.db .dump > posterg_backup.sql # Restore from backup sqlite3 new_posterg.db < posterg_backup.sql ``` #### Method 3: Automated Backups Create a backup script (`backup.sh`): ```bash #!/bin/bash BACKUP_DIR="/home/padlock/dev/posterg/db/backups" DATE=$(date +%Y%m%d_%H%M%S) DB_FILE="/home/padlock/dev/posterg/db/posterg.db" mkdir -p "$BACKUP_DIR" sqlite3 "$DB_FILE" ".backup '$BACKUP_DIR/posterg_$DATE.db'" echo "Backup created: $BACKUP_DIR/posterg_$DATE.db" # Keep only last 30 backups ls -t "$BACKUP_DIR"/posterg_*.db | tail -n +31 | xargs rm -f ``` Run daily with cron: ```bash # Edit crontab crontab -e # Add daily backup at 2am 0 2 * * * /home/padlock/dev/posterg/db/backup.sh ``` ### Database Maintenance #### Optimize Database (Vacuum) Reclaim unused space and optimize performance: ```bash sqlite3 posterg.db "VACUUM;" ``` **When to run**: After large deletions or monthly. #### Analyze Database Update query optimizer statistics: ```bash sqlite3 posterg.db "ANALYZE;" ``` **When to run**: After significant data changes. #### Check Integrity Verify database integrity: ```bash sqlite3 posterg.db "PRAGMA integrity_check;" ``` **Expected output**: `ok` #### Database Statistics ```sql -- Database size SELECT page_count * page_size / 1024 / 1024.0 AS size_mb FROM pragma_page_count(), pragma_page_size(); -- Row counts SELECT 'theses' as table_name, COUNT(*) as rows FROM theses UNION ALL SELECT 'authors', COUNT(*) FROM authors UNION ALL SELECT 'keywords', COUNT(*) FROM keywords; -- Index usage SELECT name, tbl_name FROM sqlite_master WHERE type = 'index' ORDER BY tbl_name; ``` ### Migration Best Practices When updating the schema: 1. **Always backup first**: ```bash cp posterg.db posterg_before_migration.db ``` 2. **Test migration on backup**: ```bash sqlite3 posterg_test.db < migration.sql ``` 3. **Use transactions**: ```sql BEGIN TRANSACTION; -- Your changes here ALTER TABLE theses ADD COLUMN new_field TEXT; -- Test queries SELECT * FROM theses LIMIT 1; COMMIT; -- or ROLLBACK if something went wrong ``` 4. **Document changes**: Create migration files like `migrations/001_add_new_field.sql` --- ## Troubleshooting ### Common Issues #### Database is Locked **Symptom**: `Error: database is locked` **Cause**: Another process has the database open for writing. **Solution**: ```bash # Find processes using the database lsof posterg.db # Or force close fuser -k posterg.db # Prevent by using WAL mode sqlite3 posterg.db "PRAGMA journal_mode=WAL;" ``` #### Foreign Key Violations **Symptom**: `FOREIGN KEY constraint failed` **Cause**: Trying to insert a reference to a non-existent record. **Solution**: ```sql -- Enable foreign key enforcement (check if it's on) PRAGMA foreign_keys = ON; -- Verify referenced record exists SELECT id FROM orientations WHERE id = 8; ``` #### Unique Constraint Violation **Symptom**: `UNIQUE constraint failed` **Solution**: ```sql -- Use INSERT OR IGNORE to skip duplicates INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance'); -- Or INSERT OR REPLACE to update INSERT OR REPLACE INTO keywords (id, keyword) VALUES (1, 'performance'); ``` #### Cannot Find Database File **Symptom**: `Error: unable to open database file` **Solution**: ```bash # Use absolute path sqlite3 /home/padlock/dev/posterg/db/posterg.db # Or navigate to directory first cd /home/padlock/dev/posterg/db sqlite3 posterg.db ``` ### Performance Issues #### Slow Queries **Diagnosis**: ```sql -- Enable query timer .timer on -- Explain query plan EXPLAIN QUERY PLAN SELECT * FROM theses WHERE year = 2025; ``` **Solutions**: - Add indexes on frequently queried columns - Use views for complex queries - Run `ANALYZE;` to update statistics #### Large Database **Solutions**: ```bash # Compress old data sqlite3 posterg.db "VACUUM;" # Use WAL mode for better concurrency sqlite3 posterg.db "PRAGMA journal_mode=WAL;" # Archive old theses to separate database ``` ### Data Quality Issues #### Find Orphaned Records ```sql -- Authors with no theses SELECT a.* FROM authors a LEFT JOIN thesis_authors ta ON a.id = ta.author_id WHERE ta.author_id IS NULL; -- Theses missing required fields SELECT id, identifier, title FROM theses WHERE orientation_id IS NULL OR finality_id IS NULL; ``` #### Validate Keyword Count ```sql -- Theses with more than 10 keywords SELECT thesis_id, COUNT(*) as keyword_count FROM thesis_keywords GROUP BY thesis_id HAVING keyword_count > 10; ``` ### Recovery Procedures #### Restore from Backup ```bash # From SQL dump sqlite3 posterg_restored.db < posterg_backup.sql # From database file cp posterg_backup_20260127.db posterg.db ``` #### Corrupted Database ```bash # Try to recover sqlite3 posterg.db ".recover" | sqlite3 recovered.db # Or dump and reimport sqlite3 posterg.db .dump | sqlite3 new_posterg.db ``` --- ## Advanced Tips ### Performance Optimization ```sql -- Enable Write-Ahead Logging (WAL) for better concurrency PRAGMA journal_mode=WAL; -- Increase cache size (in KB) PRAGMA cache_size=-64000; -- 64MB cache -- Enable memory-mapped I/O (in bytes) PRAGMA mmap_size=268435456; -- 256MB -- Synchronous mode (less safe but faster) PRAGMA synchronous=NORMAL; -- Default is FULL ``` ### Useful SQLite Commands ```sql -- Export table to CSV .mode csv .output theses.csv SELECT * FROM v_theses_public; .output stdout -- Import CSV .mode csv .import data.csv table_name -- Show execution time .timer on -- Show query plan .eqp on -- Pretty formatting .mode column .headers on .width 10 40 20 -- Save frequently used queries .save my_queries.sql ``` ### Custom Functions (Application Level) When building your application, you can create custom SQLite functions: **Python example**: ```python import sqlite3 def keyword_count(thesis_id): """Custom function to count keywords""" # Implementation pass conn = sqlite3.connect('posterg.db') conn.create_function('keyword_count', 1, keyword_count) ``` --- ## Next Steps After setting up the database: 1. **Import existing data** from `Database_TFE_test.csv` - Create import script (Python/Node.js recommended) - Parse CSV and map to schema - Handle comma-separated values - Validate data quality 2. **Define license types** - Consult with legal/admin - Populate `license_types` table 3. **Build application layer** - REST API or GraphQL - Authentication/authorization - File upload handling - Email notifications 4. **Create admin interface** - CRUD operations for all entities - Bulk import/export - User management - Workflow management 5. **Build public website** - Search and filter - Thesis display - Respect access controls - Static pages management --- ## Resources ### SQLite Documentation - Official docs: https://sqlite.org/docs.html - SQL syntax: https://sqlite.org/lang.html - Datatypes: https://sqlite.org/datatype3.html ### Tools - DB Browser: https://sqlitebrowser.org/ - sqlite-web: https://github.com/coleifer/sqlite-web - SQLite CLI: https://sqlite.org/cli.html ### Best Practices - Always use transactions for multiple operations - Enable foreign keys: `PRAGMA foreign_keys = ON;` - Backup before schema changes - Use prepared statements in applications - Index frequently queried columns --- ## Support For issues related to: - **Schema design**: Review this document and README.md - **Data import**: Check CSV format and data types - **Performance**: Run `ANALYZE` and check indexes - **Corruption**: Restore from backup --- **Last Updated**: 2026-01-27 **Schema Version**: 1.0 **Database**: SQLite 3