Files
xamxam/storage/docs/SETUP.md
Théophile Gervreau-Mercier 7fca85d1c1 refactor: rename database → storage
More semantically accurate: contains SQLite files, schema, fixtures, test data.
Updated all references in code, scripts, docs.
2026-02-12 12:12:58 +01:00

34 KiB

Post-ERG Thesis Database - Setup Guide

Complete guide for setting up and managing the SQLite database for the Post-ERG thesis archive platform.


Table of Contents

  1. Quick Start
  2. Prerequisites
  3. Database Setup
  4. Schema Overview
  5. Detailed Schema Description
  6. Common Operations
  7. Backup & Maintenance
  8. Troubleshooting

Quick Start

For the impatient, here's the fastest way to get started:

# Navigate to the database directory
cd /home/padlock/dev/posterg/db

# Create the database and apply schema
sqlite3 posterg.db < schema.sql

# Verify the database was created
sqlite3 posterg.db "SELECT name FROM sqlite_master WHERE type='table';"

# Check predefined data was loaded
sqlite3 posterg.db "SELECT * FROM orientations;"

You now have a fully initialized Post-ERG thesis database!


Prerequisites

Required Software

  • SQLite 3 (version 3.8.0 or higher recommended)
    • Check version: sqlite3 --version
    • Install on Linux: sudo apt-get install sqlite3
    • Install on macOS: brew install sqlite3 (usually pre-installed)
    • Install on Windows: Download from sqlite.org/download.html

Optional Tools

  • DB Browser for SQLite - GUI tool for database management

  • sqlite-web - Web-based SQLite database browser

    pip install sqlite-web
    sqlite_web posterg.db
    

Database Setup

Step 1: Project Structure

Ensure your directory structure looks like this:

/home/padlock/dev/posterg/db/
├── schema.sql                    # Database schema definition
├── Database_TFE_test.csv         # Sample/test CSV data
├── posterg_fiche-technique.md    # Technical specifications
├── SETUP.md                      # This file
├── README.md                     # Schema documentation
└── posterg.db                    # Database file (created in next step)

Step 2: Create the Database

Create an empty SQLite database and apply the schema:

# Method 1: Using shell redirection (recommended)
sqlite3 posterg.db < schema.sql

# Method 2: Interactive mode
sqlite3 posterg.db
sqlite> .read schema.sql
sqlite> .quit

# Method 3: One-liner
cat schema.sql | sqlite3 posterg.db

Step 3: Verify Installation

Check that all tables were created successfully:

sqlite3 posterg.db <<EOF
-- List all tables
.tables

-- Count tables (should be ~20)
SELECT COUNT(*) as table_count
FROM sqlite_master
WHERE type='table';

-- Show schema for main theses table
.schema theses

-- Verify predefined data
SELECT COUNT(*) FROM orientations;  -- Should return 15
SELECT COUNT(*) FROM ap_programs;   -- Should return 4
SELECT COUNT(*) FROM finality_types; -- Should return 3
SELECT COUNT(*) FROM access_types;   -- Should return 3
SELECT COUNT(*) FROM pages;          -- Should return 4
EOF

Expected output:

  • 15 orientations
  • 4 AP programs
  • 3 finality types
  • 3 access types
  • 4 static pages (charte, about, licenses, contact)

Step 4: Test the Database

Run a simple test query:

sqlite3 posterg.db "SELECT name FROM orientations LIMIT 5;"

Should output:

Arts Numériques
Dessin
Cinéma d'animation
Installation-Performance
Peinture

Schema Overview

Database Architecture

The Post-ERG database uses a relational model with proper normalization (3NF) to ensure:

  • Data integrity
  • No redundancy
  • Flexible querying
  • Easy maintenance

Entity-Relationship Diagram (Conceptual)

┌─────────────┐       ┌──────────────┐       ┌──────────────┐
│   Authors   │───────│Thesis Authors│───────│   Theses     │
└─────────────┘  1:N  └──────────────┘  N:1  └──────────────┘
                                                     │
                                                     │ N:1
┌─────────────┐       ┌──────────────┐              │
│ Supervisors │───────│Thesis Supvrs │──────────────┤
└─────────────┘  1:N  └──────────────┘              │
                                                     │
┌─────────────┐       ┌──────────────┐              │
│  Keywords   │───────│Thesis Keywrds│──────────────┤
└─────────────┘  1:N  └──────────────┘              │
                                                     │
┌─────────────┐       ┌──────────────┐              │
│  Languages  │───────│Thesis Langs  │──────────────┤
└─────────────┘  1:N  └──────────────┘              │
                                                     │
┌─────────────┐       ┌──────────────┐              │
│Format Types │───────│Thesis Formats│──────────────┤
└─────────────┘  1:N  └──────────────┘              │
                                                     │
┌─────────────┐                                     │
│Orientations │─────────────────────────────────────┤
└─────────────┘                                N:1  │
                                                     │
┌─────────────┐                                     │
│ AP Programs │─────────────────────────────────────┤
└─────────────┘                                N:1  │
                                                     │
┌─────────────┐                                     │
│Finality Type│─────────────────────────────────────┤
└─────────────┘                                N:1  │
                                                     │
┌─────────────┐                                     │
│Access Types │─────────────────────────────────────┤
└─────────────┘                                N:1  │
                                                     │
┌─────────────┐                                     │
│License Types│─────────────────────────────────────┤
└─────────────┘                                N:1  │
                                                     │
┌─────────────┐                                     │
│Thesis Files │─────────────────────────────────────┘
└─────────────┘                                1:N

Table Categories

1. Core Tables (7 tables)

  • theses - Main thesis records
  • authors - Student/author information
  • supervisors - Thesis promoters
  • thesis_files - File uploads
  • pages - Static content pages

2. Reference/Lookup Tables (7 tables)

  • orientations - Academic orientations
  • ap_programs - AP programs (ateliers)
  • finality_types - Master finality
  • languages - Thesis languages
  • format_types - Work formats
  • keywords - Thesis keywords
  • access_types - Access levels
  • license_types - License options

3. Junction Tables (5 tables)

  • thesis_authors - Many-to-many: theses ↔ authors
  • thesis_supervisors - Many-to-many: theses ↔ supervisors
  • thesis_languages - Many-to-many: theses ↔ languages
  • thesis_formats - Many-to-many: theses ↔ formats
  • thesis_keywords - Many-to-many: theses ↔ keywords

4. Views (2 views)

  • v_theses_full - Complete thesis data (admin view)
  • v_theses_public - Published theses only (public view)

Detailed Schema Description

Core Tables

theses - The Heart of the Database

The central table storing all thesis metadata and state information.

Purpose: Store all thesis projects (TFE and doctoral theses) with complete metadata, publication workflow state, and access control.

Columns:

Column Type Description
id INTEGER PK Unique identifier (auto-increment)
identifier TEXT UNIQUE Human-readable ID (e.g., "2025-002")
Basic Information
title TEXT NOT NULL Thesis title
subtitle TEXT Optional subtitle
year INTEGER NOT NULL Year of submission/defense
is_doctoral BOOLEAN 0=TFE, 1=Doctoral thesis
Academic Details
orientation_id INTEGER FK Links to orientations table
ap_program_id INTEGER FK Links to ap_programs table
finality_id INTEGER FK Links to finality_types table
Content
synopsis TEXT ~200 word description
context_note TEXT Jury president note (max 150 words)
remarks TEXT Internal remarks/notes
Duration/Size
duration_minutes INTEGER For audio/video works
duration_pages INTEGER For written works
file_size_info TEXT Free-form size description
Access & Licensing
access_type_id INTEGER FK Libre/Interne/Interdit
license_id INTEGER FK Links to license_types
Jury Information
jury_points DECIMAL(4,2) Points out of 20 (e.g., 15.50)
jury_note_added BOOLEAN Whether jury added context note
Publication Workflow
submitted_at DATETIME When student submitted
defense_date DATETIME Date of soutenance
published_at DATETIME When made public
is_published BOOLEAN Publication status flag
External Links
baiu_link TEXT Link to BAIU repository
Timestamps
created_at DATETIME Record creation (auto)
updated_at DATETIME Last modification (auto-updated)

Indexes:

  • idx_theses_year - Fast filtering by year
  • idx_theses_published - Quick access to published theses
  • idx_theses_identifier - Fast lookup by identifier
  • idx_theses_orientation - Filter by orientation
  • idx_theses_ap_program - Filter by AP program
  • idx_theses_access_type - Filter by access level

Business Logic:

  1. Publication Workflow:

    Student submits → submitted_at set
    Defense happens → defense_date set
    Jury reviews → jury_points + context_note added
    Publication → published_at set, is_published = 1
    
  2. Access Control:

    • Libre: Full access everywhere
    • Interne: Physical only, note online
    • Interdit: No access, note online only
    • Important: Can only restrict, never open

authors - Student Information

Purpose: Store unique author/student records to avoid duplication.

Columns:

Column Type Description
id INTEGER PK Unique author ID
name TEXT NOT NULL Full name
email TEXT Contact email (optional)
created_at DATETIME When added
updated_at DATETIME Last modified

Indexed: email for fast lookup

Relationships: One author can have multiple theses (via thesis_authors)

supervisors - Thesis Promoters

Purpose: Store unique supervisor/promoter records.

Columns:

Column Type Description
id INTEGER PK Unique supervisor ID
name TEXT NOT NULL Full name
created_at DATETIME When added
updated_at DATETIME Last modified

Relationships: One supervisor can supervise multiple theses (via thesis_supervisors)

thesis_files - File Attachments

Purpose: Track all uploaded files associated with a thesis.

Columns:

Column Type Description
id INTEGER PK Unique file ID
thesis_id INTEGER FK Links to theses
file_type TEXT 'main', 'annex', 'written_part', 'other'
file_path TEXT NOT NULL Server path to file
file_name TEXT NOT NULL Original filename
file_size INTEGER Size in bytes
mime_type TEXT MIME type (e.g., 'application/pdf')
description TEXT Optional file description
uploaded_at DATETIME Upload timestamp

File Types:

  • main - The primary TFE work
  • annex - Supporting materials
  • written_part - Written thesis component
  • other - Miscellaneous files

Cascade Delete: When a thesis is deleted, all its files are automatically deleted from the database record (you'll need to handle actual file deletion separately).

pages - Static Content Management

Purpose: Editable static pages for the website (charte, about, etc.).

Columns:

Column Type Description
id INTEGER PK Unique page ID
slug TEXT UNIQUE URL-friendly identifier
title TEXT NOT NULL Page title
content TEXT Markdown/HTML content
is_published BOOLEAN Visibility flag
created_at DATETIME Creation timestamp
updated_at DATETIME Last modification

Pre-loaded Pages:

  • charte - Charter/guidelines
  • about - About the project
  • licenses - License information
  • contact - Contact information

Usage: Allows non-technical users to edit important static content without touching code.


Reference Tables (Predefined Lists)

orientations - Academic Orientations

Purpose: Predefined list of artistic/academic orientations at ERG.

Pre-loaded Values (15 total):

  1. Arts Numériques
  2. Dessin
  3. Cinéma d'animation
  4. Installation-Performance
  5. Peinture
  6. Photographie
  7. Sculpture
  8. Vidéographie
  9. Graphisme
  10. Typographie
  11. Design Numérique
  12. Illustration
  13. Bande-Dessinée
  14. Sérigraphie
  15. Gravure

Schema:

CREATE TABLE orientations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Usage: Each thesis links to one orientation via theses.orientation_id.

ap_programs - Atelier Pratiques (AP)

Purpose: Predefined list of AP programs.

Pre-loaded Values (4 total):

  1. Narration Spéculative
  2. Design et Politique du Multiple (DPM)
  3. Atelier Pratiques Situées (APS)
  4. Lieux, Interdisciplinarités, Écologie, Nécessité, Systèmes (LIENS)

Schema:

CREATE TABLE ap_programs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    code TEXT,  -- e.g., 'DPM', 'LIENS'
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Usage: Each thesis can optionally link to one AP program.

finality_types - Master Finality

Purpose: Type of master's degree finality.

Pre-loaded Values (3 total):

  1. Approfondi
  2. Enseignement
  3. Spécialisé

Usage: Each thesis links to one finality type.

languages - Thesis Languages

Purpose: Languages in which theses are written.

Pre-loaded Values:

  • Français
  • Anglais

Expandable: New languages can be added as needed.

Many-to-Many: A thesis can be multilingual (via thesis_languages).

format_types - Work Formats

Purpose: Physical/digital format of the thesis work.

Pre-loaded Values (7 total):

  1. Site web
  2. Audio
  3. Vidéo
  4. Performance
  5. Objet éditorial
  6. Installation
  7. Autre

Many-to-Many: A thesis can have multiple formats (e.g., "Vidéo + Objet éditorial").

keywords - Thesis Keywords

Purpose: Dynamic, expandable keyword system for categorization.

Schema:

CREATE TABLE keywords (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    keyword TEXT NOT NULL UNIQUE,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Characteristics:

  • Starts empty, grows organically
  • No predefined list
  • Each keyword is unique across the database
  • Max 10 keywords per thesis (enforced in application)

Many-to-Many: Via thesis_keywords junction table.

access_types - Access Permissions

Purpose: Define how theses can be accessed.

Pre-loaded Values (3 types):

Name Description
Libre Freely accessible online and in physical library
Interne Physical access only; descriptive note online
Interdit No access; descriptive note online only

Important Business Rule: Access can be restricted but never opened.

  • Allowed: Libre → Interne → Interdit
  • Not allowed: Interdit → Interne or Libre

This must be enforced in application logic.

license_types - Licensing Options

Purpose: Legal licensing information for theses.

Schema:

CREATE TABLE license_types (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    description TEXT,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

Status: To be populated later (options still being determined as per specs).

Potential Values (examples):

  • CC BY 4.0
  • CC BY-SA 4.0
  • CC BY-NC 4.0
  • All Rights Reserved
  • Custom License

Junction Tables (Many-to-Many Relationships)

Junction tables enable many-to-many relationships between entities.

thesis_authors - Thesis ↔ Authors

Purpose: Link theses to their authors (can have multiple authors).

Schema:

CREATE TABLE thesis_authors (
    thesis_id INTEGER NOT NULL,
    author_id INTEGER NOT NULL,
    author_order INTEGER DEFAULT 1,  -- First author, second author, etc.
    PRIMARY KEY (thesis_id, author_id),
    FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE,
    FOREIGN KEY (author_id) REFERENCES authors(id) ON DELETE CASCADE
);

Composite Primary Key: (thesis_id, author_id) ensures no duplicate pairings.

Ordering: author_order preserves author sequence for citation purposes.

Example:

-- Thesis with 2 authors
INSERT INTO thesis_authors (thesis_id, author_id, author_order) VALUES
    (1, 5, 1),  -- First author
    (1, 8, 2);  -- Second author

thesis_supervisors - Thesis ↔ Supervisors

Purpose: Link theses to their supervisors/promoters (can have multiple).

Schema: Similar to thesis_authors, includes supervisor_order.

Example:

-- Thesis with co-promoters
INSERT INTO thesis_supervisors (thesis_id, supervisor_id, supervisor_order) VALUES
    (1, 3, 1),  -- Primary promoter
    (1, 7, 2);  -- Co-promoter

thesis_languages - Thesis ↔ Languages

Purpose: Support multilingual theses.

Schema:

CREATE TABLE thesis_languages (
    thesis_id INTEGER NOT NULL,
    language_id INTEGER NOT NULL,
    PRIMARY KEY (thesis_id, language_id),
    FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE,
    FOREIGN KEY (language_id) REFERENCES languages(id) ON DELETE CASCADE
);

Example:

-- Bilingual thesis (French + English)
INSERT INTO thesis_languages (thesis_id, language_id) VALUES
    (1, 1),  -- French
    (1, 2);  -- English

thesis_formats - Thesis ↔ Formats

Purpose: Support multi-format works.

Example Use Case: A thesis that is both a video and has an editorial object (book).

INSERT INTO thesis_formats (thesis_id, format_id) VALUES
    (10, 3),  -- Video
    (10, 5);  -- Objet éditorial

thesis_keywords - Thesis ↔ Keywords

Purpose: Tag theses with up to 10 keywords for discovery.

Business Rule: Maximum 10 keywords per thesis (enforce in application).

Example:

-- Add keywords to a thesis
INSERT INTO thesis_keywords (thesis_id, keyword_id) VALUES
    (1, 15),  -- "performance"
    (1, 22),  -- "urbanisme"
    (1, 8);   -- "sociologie"

Indexed for fast searching:

  • idx_thesis_keywords_thesis - Find all keywords for a thesis
  • idx_thesis_keywords_keyword - Find all theses for a keyword

Views (Simplified Querying)

Views are pre-written queries that act like virtual tables.

v_theses_full - Complete Thesis Data

Purpose: Administrative view with all thesis information in one query.

What it does:

  • Joins all related tables
  • Concatenates multiple values (authors, supervisors, keywords, etc.)
  • Displays human-readable names instead of IDs

Columns: All thesis metadata plus:

  • authors - Comma-separated author names
  • supervisors - Comma-separated supervisor names
  • languages - Comma-separated language names
  • formats - Comma-separated format types
  • keywords - Comma-separated keywords
  • Plus all human-readable names (orientation, AP, finality, etc.)

Usage:

-- Get complete info for thesis #5
SELECT * FROM v_theses_full WHERE id = 5;

-- All theses from 2025 in Vidéographie
SELECT * FROM v_theses_full
WHERE year = 2025 AND orientation = 'Vidéographie';

Performance Note: This is a complex join. Use for admin interfaces, not high-traffic public pages.

v_theses_public - Published Theses Only

Purpose: Public-facing view showing only published theses.

What it does:

  • Same as v_theses_full
  • But filtered to is_published = 1

Usage:

-- Safe for public website
SELECT * FROM v_theses_public
WHERE year = 2025
ORDER BY title;

Security: Ensures unpublished theses are never exposed to public.


Automatic Features

Auto-Incrementing IDs

All primary keys use AUTOINCREMENT:

id INTEGER PRIMARY KEY AUTOINCREMENT

Benefit: You never need to specify IDs manually. SQLite handles it.

Example:

-- SQLite automatically assigns id = 1, 2, 3, etc.
INSERT INTO authors (name, email) VALUES ('Alice Néron', 'alice@example.com');

Automatic Timestamps

Creation Timestamps:

created_at DATETIME DEFAULT CURRENT_TIMESTAMP

Automatically set when a record is inserted.

Update Timestamps:

Triggers automatically update updated_at when records change:

CREATE TRIGGER update_theses_timestamp
AFTER UPDATE ON theses
BEGIN
    UPDATE theses SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
END;

Benefit: Full audit trail without manual date management.

Example:

-- created_at is set automatically
INSERT INTO authors (name) VALUES ('Bob Smith');

-- updated_at is set automatically on update
UPDATE authors SET email = 'bob@newmail.com' WHERE id = 1;

Cascade Deletes

When you delete a thesis, all related records are automatically removed:

FOREIGN KEY (thesis_id) REFERENCES theses(id) ON DELETE CASCADE

Affected Tables:

  • thesis_authors
  • thesis_supervisors
  • thesis_languages
  • thesis_formats
  • thesis_keywords
  • thesis_files

Example:

-- This also deletes all associated authors, keywords, files, etc.
DELETE FROM theses WHERE id = 10;

Warning: This is permanent and cannot be undone!


Common Operations

Querying

Basic Queries

# Enter SQLite shell
sqlite3 posterg.db

# List all tables
.tables

# Show table structure
.schema theses

# Pretty output
.mode column
.headers on

# Run a query
SELECT * FROM orientations;

# Exit
.quit

Find Published Theses

SELECT title, year, authors, orientation
FROM v_theses_public
WHERE year >= 2024
ORDER BY year DESC, title;

Search by Keyword

SELECT t.title, t.year, GROUP_CONCAT(k.keyword) as keywords
FROM theses t
JOIN thesis_keywords tk ON t.id = tk.thesis_id
JOIN keywords k ON tk.keyword_id = k.id
WHERE k.keyword LIKE '%performance%'
GROUP BY t.id;

Find Theses by Author

SELECT t.title, t.year, a.name as author
FROM theses t
JOIN thesis_authors ta ON t.id = ta.thesis_id
JOIN authors a ON ta.author_id = a.id
WHERE a.name LIKE '%Lucie%'
ORDER BY t.year DESC;

Get Unpublished Theses (Admin)

SELECT identifier, title, submitted_at, defense_date
FROM theses
WHERE submitted_at IS NOT NULL
  AND is_published = 0
ORDER BY submitted_at DESC;

Inserting Data

Add a New Author

INSERT INTO authors (name, email) VALUES
    ('Marie Dupont', 'marie.dupont@example.com');

Add a New Thesis (Basic)

INSERT INTO theses (
    identifier, title, year, orientation_id, finality_id, synopsis
) VALUES (
    '2026-001',
    'Mon Titre de TFE',
    2026,
    8,  -- Vidéographie
    1,  -- Approfondi
    'Un synopsis fascinant de mon travail...'
);
-- Get thesis ID and author ID first
INSERT INTO thesis_authors (thesis_id, author_id, author_order)
VALUES (1, 5, 1);

Add Keywords to Thesis

-- First, ensure keyword exists
INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance');

-- Then link it
INSERT INTO thesis_keywords (thesis_id, keyword_id)
SELECT 1, id FROM keywords WHERE keyword = 'performance';

Updating Data

Update Thesis Status to Published

UPDATE theses
SET is_published = 1,
    published_at = CURRENT_TIMESTAMP
WHERE id = 5;

Add Jury Points and Note

UPDATE theses
SET jury_points = 16.5,
    context_note = 'Ce travail remarquable explore...',
    jury_note_added = 1
WHERE id = 5;

Restrict Access (Libre → Interne)

UPDATE theses
SET access_type_id = (SELECT id FROM access_types WHERE name = 'Interne')
WHERE id = 10;

Update Page Content

UPDATE pages
SET content = 'Nouveau contenu de la page...',
    updated_at = CURRENT_TIMESTAMP
WHERE slug = 'about';

Deleting Data

Warning: Deletes are permanent in SQLite!

-- This cascades to thesis_authors, thesis_keywords, etc.
DELETE FROM theses WHERE id = 10;

Remove Keyword from Thesis

DELETE FROM thesis_keywords
WHERE thesis_id = 5 AND keyword_id = 12;

Delete Unused Keywords

-- Remove keywords not linked to any thesis
DELETE FROM keywords
WHERE id NOT IN (SELECT DISTINCT keyword_id FROM thesis_keywords);

Backup & Maintenance

Backup Strategies

Method 1: File Copy (Simplest)

# Copy the database file
cp posterg.db posterg_backup_$(date +%Y%m%d).db

# Or with compression
tar -czf posterg_backup_$(date +%Y%m%d).tar.gz posterg.db

Method 2: SQL Dump (Most Portable)

# Export entire database to SQL
sqlite3 posterg.db .dump > posterg_backup.sql

# Restore from backup
sqlite3 new_posterg.db < posterg_backup.sql

Method 3: Automated Backups

Create a backup script (backup.sh):

#!/bin/bash
BACKUP_DIR="/home/padlock/dev/posterg/db/backups"
DATE=$(date +%Y%m%d_%H%M%S)
DB_FILE="/home/padlock/dev/posterg/db/posterg.db"

mkdir -p "$BACKUP_DIR"
sqlite3 "$DB_FILE" ".backup '$BACKUP_DIR/posterg_$DATE.db'"
echo "Backup created: $BACKUP_DIR/posterg_$DATE.db"

# Keep only last 30 backups
ls -t "$BACKUP_DIR"/posterg_*.db | tail -n +31 | xargs rm -f

Run daily with cron:

# Edit crontab
crontab -e

# Add daily backup at 2am
0 2 * * * /home/padlock/dev/posterg/db/backup.sh

Database Maintenance

Optimize Database (Vacuum)

Reclaim unused space and optimize performance:

sqlite3 posterg.db "VACUUM;"

When to run: After large deletions or monthly.

Analyze Database

Update query optimizer statistics:

sqlite3 posterg.db "ANALYZE;"

When to run: After significant data changes.

Check Integrity

Verify database integrity:

sqlite3 posterg.db "PRAGMA integrity_check;"

Expected output: ok

Database Statistics

-- Database size
SELECT page_count * page_size / 1024 / 1024.0 AS size_mb
FROM pragma_page_count(), pragma_page_size();

-- Row counts
SELECT 'theses' as table_name, COUNT(*) as rows FROM theses
UNION ALL
SELECT 'authors', COUNT(*) FROM authors
UNION ALL
SELECT 'keywords', COUNT(*) FROM keywords;

-- Index usage
SELECT name, tbl_name FROM sqlite_master
WHERE type = 'index'
ORDER BY tbl_name;

Migration Best Practices

When updating the schema:

  1. Always backup first:

    cp posterg.db posterg_before_migration.db
    
  2. Test migration on backup:

    sqlite3 posterg_test.db < migration.sql
    
  3. Use transactions:

    BEGIN TRANSACTION;
    -- Your changes here
    ALTER TABLE theses ADD COLUMN new_field TEXT;
    -- Test queries
    SELECT * FROM theses LIMIT 1;
    COMMIT;  -- or ROLLBACK if something went wrong
    
  4. Document changes: Create migration files like migrations/001_add_new_field.sql


Troubleshooting

Common Issues

Database is Locked

Symptom: Error: database is locked

Cause: Another process has the database open for writing.

Solution:

# Find processes using the database
lsof posterg.db

# Or force close
fuser -k posterg.db

# Prevent by using WAL mode
sqlite3 posterg.db "PRAGMA journal_mode=WAL;"

Foreign Key Violations

Symptom: FOREIGN KEY constraint failed

Cause: Trying to insert a reference to a non-existent record.

Solution:

-- Enable foreign key enforcement (check if it's on)
PRAGMA foreign_keys = ON;

-- Verify referenced record exists
SELECT id FROM orientations WHERE id = 8;

Unique Constraint Violation

Symptom: UNIQUE constraint failed

Solution:

-- Use INSERT OR IGNORE to skip duplicates
INSERT OR IGNORE INTO keywords (keyword) VALUES ('performance');

-- Or INSERT OR REPLACE to update
INSERT OR REPLACE INTO keywords (id, keyword) VALUES (1, 'performance');

Cannot Find Database File

Symptom: Error: unable to open database file

Solution:

# Use absolute path
sqlite3 /home/padlock/dev/posterg/db/posterg.db

# Or navigate to directory first
cd /home/padlock/dev/posterg/db
sqlite3 posterg.db

Performance Issues

Slow Queries

Diagnosis:

-- Enable query timer
.timer on

-- Explain query plan
EXPLAIN QUERY PLAN
SELECT * FROM theses WHERE year = 2025;

Solutions:

  • Add indexes on frequently queried columns
  • Use views for complex queries
  • Run ANALYZE; to update statistics

Large Database

Solutions:

# Compress old data
sqlite3 posterg.db "VACUUM;"

# Use WAL mode for better concurrency
sqlite3 posterg.db "PRAGMA journal_mode=WAL;"

# Archive old theses to separate database

Data Quality Issues

Find Orphaned Records

-- Authors with no theses
SELECT a.* FROM authors a
LEFT JOIN thesis_authors ta ON a.id = ta.author_id
WHERE ta.author_id IS NULL;

-- Theses missing required fields
SELECT id, identifier, title FROM theses
WHERE orientation_id IS NULL OR finality_id IS NULL;

Validate Keyword Count

-- Theses with more than 10 keywords
SELECT thesis_id, COUNT(*) as keyword_count
FROM thesis_keywords
GROUP BY thesis_id
HAVING keyword_count > 10;

Recovery Procedures

Restore from Backup

# From SQL dump
sqlite3 posterg_restored.db < posterg_backup.sql

# From database file
cp posterg_backup_20260127.db posterg.db

Corrupted Database

# Try to recover
sqlite3 posterg.db ".recover" | sqlite3 recovered.db

# Or dump and reimport
sqlite3 posterg.db .dump | sqlite3 new_posterg.db

Advanced Tips

Performance Optimization

-- Enable Write-Ahead Logging (WAL) for better concurrency
PRAGMA journal_mode=WAL;

-- Increase cache size (in KB)
PRAGMA cache_size=-64000;  -- 64MB cache

-- Enable memory-mapped I/O (in bytes)
PRAGMA mmap_size=268435456;  -- 256MB

-- Synchronous mode (less safe but faster)
PRAGMA synchronous=NORMAL;  -- Default is FULL

Useful SQLite Commands

-- Export table to CSV
.mode csv
.output theses.csv
SELECT * FROM v_theses_public;
.output stdout

-- Import CSV
.mode csv
.import data.csv table_name

-- Show execution time
.timer on

-- Show query plan
.eqp on

-- Pretty formatting
.mode column
.headers on
.width 10 40 20

-- Save frequently used queries
.save my_queries.sql

Custom Functions (Application Level)

When building your application, you can create custom SQLite functions:

Python example:

import sqlite3

def keyword_count(thesis_id):
    """Custom function to count keywords"""
    # Implementation
    pass

conn = sqlite3.connect('posterg.db')
conn.create_function('keyword_count', 1, keyword_count)

Next Steps

After setting up the database:

  1. Import existing data from Database_TFE_test.csv

    • Create import script (Python/Node.js recommended)
    • Parse CSV and map to schema
    • Handle comma-separated values
    • Validate data quality
  2. Define license types

    • Consult with legal/admin
    • Populate license_types table
  3. Build application layer

    • REST API or GraphQL
    • Authentication/authorization
    • File upload handling
    • Email notifications
  4. Create admin interface

    • CRUD operations for all entities
    • Bulk import/export
    • User management
    • Workflow management
  5. Build public website

    • Search and filter
    • Thesis display
    • Respect access controls
    • Static pages management

Resources

SQLite Documentation

Tools

Best Practices

  • Always use transactions for multiple operations
  • Enable foreign keys: PRAGMA foreign_keys = ON;
  • Backup before schema changes
  • Use prepared statements in applications
  • Index frequently queried columns

Support

For issues related to:

  • Schema design: Review this document and README.md
  • Data import: Check CSV format and data types
  • Performance: Run ANALYZE and check indexes
  • Corruption: Restore from backup

Last Updated: 2026-01-27 Schema Version: 1.0 Database: SQLite 3