WSJ SI84, 7138 sentences, (removing 410's duplication in the original data) WSJ SI284, 37416 sentences, (removing 410's duplication in the data set)
speaker 410 has contributed two sessions of recordings, thus has twice the number of sentences as other speakers. During the processing, we should remove 11-2.1/wsj0/si_tr_s/401 to avoid too much data for a single speaker.
The 5K and 20K test sets for WSJ0 are Nov.92 test sets.
The 64K test sets of WSJ1 is the commonly mentioned Nov.93 test set.