# CSV Import Format Specification ## File Format - **Encoding**: UTF-8 - **Delimiter**: Comma (`,`) - **Header Rows**: First 4 rows are skipped during import - Row 1: Empty - Row 2: Headers (French labels) - Row 3: Description row - Row 4: Column names - **Data Rows**: Start from row 5 onwards ## Column Structure The CSV must contain exactly 21 columns in this order: | Index | Field Name | Required | Type | Description | |-------|------------|----------|------|-------------| | 0 | identifier | No | String | Unique identifier for the thesis | | 1 | title | **Yes** | String | Thesis title | | 2 | subtitle | No | String | Thesis subtitle | | 3 | authors | No | String | Author(s), comma-separated for multiple | | 4 | contact | No | String | Contact email (associated with first author) | | 5 | supervisors | No | String | Supervisor(s), comma-separated for multiple | | 6 | formats | No | String | Format(s), comma-separated for multiple | | 7 | year | **Yes** | Integer | Year of thesis (e.g., 2024) | | 8 | ap | No | String | AP program code (see AP Codes section) | | 9 | orientation | No | String | Orientation code (see Orientation Codes section) | | 10 | finality | No | String | Finality name | | 11 | keywords | No | String | Keywords, comma-separated (max 10) | | 12 | synopsis | No | Text | Synopsis/abstract of the thesis | | 13 | context | No | Text | Context note | | 14 | remarks | No | Text | Additional remarks | | 15 | language | No | String | Language (e.g., Français, English, Nederlands) | | 16 | access | No | String | Access authorization | | 17 | license | No | String | License information | | 18 | size_info | No | String | File size information | | 19 | jury_points | No | Float | Jury score (out of 20) | | 20 | baiu_link | No | String | Link to BAIU (institutional archive) | ## Field Details ### Required Fields - **title**: Must not be empty - **year**: Must not be empty and must be a valid integer ### Multi-Value Fields These fields accept multiple values separated by commas: - **authors**: e.g., `"John Doe, Jane Smith"` - **supervisors**: e.g., `"Prof. A, Prof. B"` - **keywords**: Maximum 10 keywords, e.g., `"art, design, digital"` - **formats**: e.g., `"PDF, Video, Installation"` ### Orientation Codes Valid orientation codes and their full names: ``` SC = Sculpture VI = Vidéographie CA = Cinéma d'animation IP = Installation-Performance PE = Peinture PH = Photographie DE = Dessin AN = Arts Numériques GR = Graphisme TY = Typographie DN = Design Numérique IL = Illustration BD = Bande-Dessinée SE = Sérigraphie GV = Gravure ``` ### AP Codes Valid AP program codes: - `DPM` - `LIENS` - `APS` (These codes must match exactly what exists in the `ap_programs` table) ### Language Values Languages should be provided with capital first letter: - `Français` - `English` - `Nederlands` - etc. ### Format Values Common format values (case-insensitive, will be normalized): - `PDF` - `Video` - `Audio` - `Installation` - `Web` - etc. ## Import Behavior ### Row Processing 1. Empty rows (no title and no identifier) are skipped 2. Each row is processed in a transaction 3. If a row fails, it is skipped and logged, but processing continues ### Data Validation - If title or year is missing, the row is rejected - Invalid orientation codes result in no orientation being set (null) - Invalid AP codes result in no AP program being set (null) - Keywords are limited to first 10 if more are provided ### Data Normalization - All string fields are trimmed of whitespace - Language and format values are normalized (first letter capitalized, rest lowercase) - Empty strings are converted to NULL in the database ### Entity Creation - Authors, supervisors, and keywords are automatically created if they don't exist - Existing authors are matched by name - Contact email is only associated with the first author ## Example CSV Structure ```csv Identifiant,Titre,Sous-titre,Auteur·ice(s),Contact,Promoteur·ice(s),Format,Année,AP,Orientation,Finalité,Mots-clés,Synopsis,Contexte,Remarques,Langue,Autorisation,License,taille,Points sur 20,lien BAIU TFE-2024-001,Mon projet artistique,Exploration du numérique,"Alice Dupont, Bob Martin",alice@example.com,Prof. Smith,PDF,2024,DPM,AN,Création,art numérique,digital art,interactive installation,Un projet explorant l'intersection de l'art et de la technologie,Réalisé dans le cadre du master,Très bon projet,Français,Public,CC-BY,250MB,16.5,https://baiu.example.org/12345 TFE-2024-002,Design graphique moderne,,Charlie Brown,charlie@example.com,"Prof. A, Prof. B","PDF, Print",2024,LIENS,GR,Design,typographie,graphisme,design,Une exploration de la typographie contemporaine,,,English,Restricted,All rights reserved,50MB,15, ``` ## Troubleshooting ### Common Issues 1. **Encoding problems**: Ensure file is saved as UTF-8 2. **Missing columns**: All 21 columns must be present, even if empty 3. **Line breaks in fields**: Ensure fields containing newlines are properly quoted 4. **Quote escaping**: Use double quotes (`""`) to escape quotes within fields ### Import Results After import, the system will display: - Number of theses successfully imported - Number of rows skipped due to errors - Detailed line-by-line results with success (✓) or error (✗) indicators ## Notes - The import process preserves the order of authors, supervisors, and keywords - The first author gets the contact email if provided - Duplicate detection is not performed - each import creates new entries - Failed rows do not stop the import process - All errors are logged to the server error log