2
votes

CakePHP's page on Data Santiziation states one should store possibly raw HTML from user input in one's database and sanitize at time of output:

For sanitization against XSS its generally better to save raw HTML in database without modification and sanitize at the time of output/display.

Why would it be preferable to store (potentially dangerous) HTML in one's database and only sanitize it for output? Wouldn't sanitizing first result in smaller storage while yielding the same function?

The only reason I can see where you would store raw HTML like this is if some pages were to sanitize the output, and some pages either did not santitize the output or were more or less strict about it than other pages.

2

2 Answers

3
votes

One big reason that comes to my mind is faulty tainting of the data. If you were to apply an overly aggressive filter to incoming HTML, it would be permanently damaged. You would have to have all that content entered in again to redeem it. If you sanitize on output, you always have "the original" and can adjust the filtering as appropriate.

3
votes

You want to have the original data by hand in its original state to prevent accidental removal of aggressive cleanup scripts.

Using CakePHP you should use the h() shortcut on everything that was entered by a user in the system when echoing it in the view.

If you're using the Sanitize class I would suggest you to create a method that will sanitize a record and put a call to this method into the afterFind() callback of a model and apply it each record that is returned. If that's not desired you can still call your sanitize method on the data as needed.