Welcome! This is a prototype, pre-alpha distribution of the Overview open source document set visualization and exploration tool. As of 2013 this system is no longer maintained. See the production https://overviewdocs.com instead.
The quick version:
myfile.csv(see format below)
something.batinstead of the shell scripts
Again, see https://blog.overviewdocs.com/2012/02/25/getting-started-with-the-overview-prototype/
Overview takes a csv of the document text as input, one document per row. The simplest possible format that Overview will read has exactly one column named "text":
text this is the content of document the first and here is the text of document the second etc. . . .
This will work, but if you later add documents to this file, your saved tags will break, because the tags are based on row numbers if you don't have a "uid" field like this:
uid,text UNIQUEID_AAA, this is the content of document the first UNIQUEID_BBB,and here is the text of document the second etc. . . .
The uid field can be any unique identifier, such as a hash of the document text. Finally, if you want Overview to display the document in its embedded browser instead of just showing the text, you can add a URL field.
uid,text,url UNIQUEID_AAA, this is the content of document the first,http://docs.com/AAA UNIQUEID_BBB,and here is the text of document the second,http://docs.com/BBB etc. . . .
Over view does not do any sort of web scraping with this URL, it just uses it to display the document.
The "text" field for each document has to be quoted and escaped according to the normal CSV rules if the document text runs more than one line or has commas in it. HTML text is fine, because Overview simply strips all tags before processing. There is no hard upper limit on the number of documents, but the current UI gets a bit bogged down at about the 10,000 to 20,000 range.
need help? ask!