Files
xamxam/docs/IMPORT.md
2026-02-05 17:37:07 +01:00

5.5 KiB

CSV Import Format Specification

File Format

  • Encoding: UTF-8
  • Delimiter: Comma (,)
  • Header Rows: First 4 rows are skipped during import
    • Row 1: Empty
    • Row 2: Headers (French labels)
    • Row 3: Description row
    • Row 4: Column names
  • Data Rows: Start from row 5 onwards

Column Structure

The CSV must contain exactly 21 columns in this order:

Index Field Name Required Type Description
0 identifier No String Unique identifier for the thesis
1 title Yes String Thesis title
2 subtitle No String Thesis subtitle
3 authors No String Author(s), comma-separated for multiple
4 contact No String Contact email (associated with first author)
5 supervisors No String Supervisor(s), comma-separated for multiple
6 formats No String Format(s), comma-separated for multiple
7 year Yes Integer Year of thesis (e.g., 2024)
8 ap No String AP program code (see AP Codes section)
9 orientation No String Orientation code (see Orientation Codes section)
10 finality No String Finality name
11 keywords No String Keywords, comma-separated (max 10)
12 synopsis No Text Synopsis/abstract of the thesis
13 context No Text Context note
14 remarks No Text Additional remarks
15 language No String Language (e.g., Français, English, Nederlands)
16 access No String Access authorization
17 license No String License information
18 size_info No String File size information
19 jury_points No Float Jury score (out of 20)
20 baiu_link No String Link to BAIU (institutional archive)

Field Details

Required Fields

  • title: Must not be empty
  • year: Must not be empty and must be a valid integer

Multi-Value Fields

These fields accept multiple values separated by commas:

  • authors: e.g., "John Doe, Jane Smith"
  • supervisors: e.g., "Prof. A, Prof. B"
  • keywords: Maximum 10 keywords, e.g., "art, design, digital"
  • formats: e.g., "PDF, Video, Installation"

Orientation Codes

Valid orientation codes and their full names:

SC = Sculpture
VI = Vidéographie
CA = Cinéma d'animation
IP = Installation-Performance
PE = Peinture
PH = Photographie
DE = Dessin
AN = Arts Numériques
GR = Graphisme
TY = Typographie
DN = Design Numérique
IL = Illustration
BD = Bande-Dessinée
SE = Sérigraphie
GV = Gravure

AP Codes

Valid AP program codes:

  • DPM
  • LIENS
  • APS

(These codes must match exactly what exists in the ap_programs table)

Language Values

Languages should be provided with capital first letter:

  • Français
  • English
  • Nederlands
  • etc.

Format Values

Common format values (case-insensitive, will be normalized):

  • PDF
  • Video
  • Audio
  • Installation
  • Web
  • etc.

Import Behavior

Row Processing

  1. Empty rows (no title and no identifier) are skipped
  2. Each row is processed in a transaction
  3. If a row fails, it is skipped and logged, but processing continues

Data Validation

  • If title or year is missing, the row is rejected
  • Invalid orientation codes result in no orientation being set (null)
  • Invalid AP codes result in no AP program being set (null)
  • Keywords are limited to first 10 if more are provided

Data Normalization

  • All string fields are trimmed of whitespace
  • Language and format values are normalized (first letter capitalized, rest lowercase)
  • Empty strings are converted to NULL in the database

Entity Creation

  • Authors, supervisors, and keywords are automatically created if they don't exist
  • Existing authors are matched by name
  • Contact email is only associated with the first author

Example CSV Structure


Identifiant,Titre,Sous-titre,Auteur·ice(s),Contact,Promoteur·ice(s),Format,Année,AP,Orientation,Finalité,Mots-clés,Synopsis,Contexte,Remarques,Langue,Autorisation,License,taille,Points sur 20,lien BAIU

TFE-2024-001,Mon projet artistique,Exploration du numérique,"Alice Dupont, Bob Martin",alice@example.com,Prof. Smith,PDF,2024,DPM,AN,Création,art numérique,digital art,interactive installation,Un projet explorant l'intersection de l'art et de la technologie,Réalisé dans le cadre du master,Très bon projet,Français,Public,CC-BY,250MB,16.5,https://baiu.example.org/12345
TFE-2024-002,Design graphique moderne,,Charlie Brown,charlie@example.com,"Prof. A, Prof. B","PDF, Print",2024,LIENS,GR,Design,typographie,graphisme,design,Une exploration de la typographie contemporaine,,,English,Restricted,All rights reserved,50MB,15,

Troubleshooting

Common Issues

  1. Encoding problems: Ensure file is saved as UTF-8
  2. Missing columns: All 21 columns must be present, even if empty
  3. Line breaks in fields: Ensure fields containing newlines are properly quoted
  4. Quote escaping: Use double quotes ("") to escape quotes within fields

Import Results

After import, the system will display:

  • Number of theses successfully imported
  • Number of rows skipped due to errors
  • Detailed line-by-line results with success (✓) or error (✗) indicators

Notes

  • The import process preserves the order of authors, supervisors, and keywords
  • The first author gets the contact email if provided
  • Duplicate detection is not performed - each import creates new entries
  • Failed rows do not stop the import process
  • All errors are logged to the server error log