Description – HumText

Each sample in the corpus includes a scanned image of the original document, accompanied by its transcription. In cases where the spelling deviates from current standards, a modernized transcription is also provided, prepared in accordance with the general contemporary rules of Spanish orthography, as well as a critical edition, which consists of the comparison between the original transcription and the modernized one.

Each sample in the corpus has been assigned a unique code that allows it to be identified according to various criteria:

the language (ES = Spanish);
the type of text (CHIS = joke;ANEC = anectode; EPIG= epigram; ENTR= short theatrical interlude;RELA= short story;CUEN = tale; NARR = short narrative; CRON= chronicle; NOTI= news item; OBIT= obituary; ESQU= death notice);
the format (1 = text; 2 = single-panel cartoon; 3 = comic strip;0 = others);
the publication medium (1= newspaper; 2= magazine; 3 =brochure; 4 = booklet; 5= fanzine; 6= almanac; 7= book; 0= others);
the publication place (MAD = Madrid; VAL = Valencia; etc.);
and the identification number within the corpus.

For example, the code 00370_ES_CHIS23_MAD orresponds to sample number 370 of Humtext, classified as a joke in comic strip format (2) published in a newspaper (3), and edited in Madrid.