Reuters Corpus Volume I (v2) - subset 4 data set 1: Description. Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made avaliable by Reuters, Ltd. for research prurposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. This data set contains the subset 4 of Reuters Corpus Volume I. 2: Type. Multi label 3: Origin. Real world 4: Instances. 6000 5: Features. 47229 6: Labels. 101 7: Missing values. No