I am writing a spidering program that does many of the functions you describe, but is written from the viewpoint of image processing, so it does not do any parsing of HTML yet. However, it is very adept at spidering sites, and allows inclusions of certain file and tag types as well as filtering of specific file name matches. It could probably be easily converted to handle your task.
It is however extremely Beta, and has some possible memory and configuration problems I am working on. It only works from a prompt, and does not have any sort of Web or GUI interface. I am working on those aspects as I clean up the spidering logic of the code.
If this program interests you, let me know, and I can email you a copy. Otherwise, wait for the final version, it will either have a web based and/or Windows based interface.
------------------
Fred Hirsch
Web Consultant & Programmer