Each sample in the corpus includes a scanned image of the original document, accompanied by its transcription. In cases where the spelling deviates from current standards, a modernized transcription is also provided, prepared in accordance with the general contemporary rules of Spanish orthography, as well as a critical edition, which consists of the comparison between the original transcription and the modernized one.
Each sample in the corpus has been assigned a unique code that allows it to be identified according to various criteria:
- the language (
ES= Spanish); - the type of text (
CHIS= joke;ANEC= anectode;EPIG= epigram;ENTR= short theatrical interlude;RELA= short story;CUEN= tale;NARR= short narrative;CRON= chronicle;NOTI= news item;OBIT= obituary;ESQU= death notice); - the format (
1= text;2= single-panel cartoon;3= comic strip;0= others); - the publication medium (
1= newspaper;2= magazine;3=brochure;4= booklet;5= fanzine;6= almanac;7= book;0= others); - the publication place (
MAD= Madrid;VAL= Valencia; etc.); - and the identification number within the corpus.
For example, the code 00370_ES_CHIS23_MAD orresponds to sample number 370 of Humtext, classified as a joke in comic strip format (2) published in a newspaper (3), and edited in Madrid.