Document Distance Data Sets
From 6.006 Introduction to Algorithms
Here are nine sample text files, mostly from Project Gutenberg for use as input files for the document distance problem:
- t1.verne.txt: Verne's In the Year 2889 (25K bytes)
- t2.bobsey.txt: Hope's The Bobsey Twins on Blueberry Island (268K bytes)
- t3.lewis.txt: Lewis and Clark's History of the Expedition under the Command of Captains Lewis and Clark (Vol. I) (1M bytes)
- t4.arabian.txt: Anon's The Arabian Nights Entertainments Complete (3M bytes)
- t5.churchill.txt: Churchill's The Complete Works of Winson Churchill (10M bytes)
- t6.onemillion.txt: List of one million integers (from 000000 to 999999) (8M bytes)
- t7.tenmillion.txt: List of ten million integers (from 0000000 to 9999999) (90M bytes)
- t8.shakespeare.txt: The Complete Works of William Shakespeare (5.5M bytes)
- t9.bacon.txt: Essays by Francis Bacon (320K bytes)
Next: Document Distance Program Version 1
Previous: Document Distance Problem Definition
Up: Document Distance