المساعد الشخصي الرقمي

مشاهدة النسخة كاملة : Getting text from a document



C# Programming
10-08-2009, 03:40 AM
I'm working on a utility that reads files and gets the 'words'. It's a sort of indexing project that I'm working on. I've got a lot of formats covered but there are a few formats - like pdf and doc - that I am having trouble with. So as I'm playing in C# I'll ask here.

Has anyone tried to mine the text from these formats and if so was it possible without an intermidary file? Ideally I would like to be able pass the file into a StreamReader derived class and read the text out of the other end.

Any idea guys and gals?


Panic, Chaos, Destruction.
My work here is done.