Most of the file-types contain some header info. But for this, you should validate the files based on the header and not on a string that usually can be found in the file. ANother thing would be to look in HTML files to be sure that they do not contain parts encoded and which are decoded upon accessing them
WRT OpenVMS: Text-type files (including straight text, HTML etc.) have file attributes, record format structure... which differentiate them from binary (such as executable) files.
OTOH checking the internal structure of HTML files would mean writing the equivalent of a parser/compiler, and that would be to protect a user who opens an unknown web page with Java enabled. Javascript intrinsically protects the user by running in a "sandbox", so it should not be able to do permanent damage.
What did you mean by "based on the header"? Un*x files have no meta-data stored other than their actual contents, and under OpenVMS files like GIF or JPEG may have any of several attributes, as the Apache server reads them one-byte-at-a-time (IIRC) not record-by-record.
Suggestions for a list of permitted files would be gratefully considered.
May the great bird of the universe roost upon your planet!
MikeR