Each sample in the corpus includes a scanned image of the original document, accompanied by its transcription. In cases where the spelling deviates from current standards, a modernized transcription is also provided, prepared in accordance with the general contemporary rules of Spanish orthography, as well as a critical edition, which consists of the comparison between the original transcription and the modernized one.
Each sample in the corpus has been assigned a unique code that allows it to be identified according to various criteria:
- the language (
ES
= Spanish); - the type of text (
CHIS
= joke;ANEC
= anectode;EPIG
= epigram;ENTR
= short theatrical interlude;RELA
= short story;CUEN
= tale;NARR
= short narrative;CRON
= chronicle;NOTI
= news item;OBIT
= obituary;ESQU
= death notice); - the format (
1
= text;2
= single-panel cartoon;3
= comic strip;0
= others); - the publication medium (
1
= newspaper;2
= magazine;3
=brochure;4
= booklet;5
= fanzine;6
= almanac;7
= book;0
= others); - the publication place (
MAD
= Madrid;VAL
= Valencia; etc.); - and the identification number within the corpus.
For example, the code 00370_ES_CHIS23_MAD
orresponds to sample number 370 of Humtext, classified as a joke in comic strip format (2
) published in a newspaper (3
), and edited in Madrid.