I was writing up drafts of this on the doc page when I reached the topic “escape” and got in a self-contradicting limbo of “escape, but not”.
As those who follow #programming channel in slack probably know, I already determined that “escape everything” is not what happens in Core Code, nor what should happen in all cases.
WordPress itself does not, in no way escape the_content
or the_title
for example, and it does not sanitize the content either, on save, at least not for WP Admins/unfiltered_html users.
Googling this topic you will find all sort of garbage explanations as of why this is the case and what to do about it.
Literally you will find people saying “you don’t need to worry because WordPress escapes it for you” up to “It is safe because it applies the_content Filter”. Google it, it is worth the time to see how badly informed the world is about this problematic.
Not one post I found does in fact explain why it is not escaped, and how WordPress ensures that this is safe anyway.
What I have found by investigating this myself is the following:
- WordPress does not escape in any way said functions, neither in the very function nor in upstream functions, because if it would, we could barely blog post with simple HTML and never for example include a Form on a page. We would have to pass an immense array to wp_kses, which would never be complete, as we could never guess for example custom HTML attributes passed to a simple DIV by some user. Effectively that would make it very hard to let someone create more than “just a simple blog”.
- WordPress ensures that
the_content
echoed is safe, by making sure only trusted users can save certain content type
Nota bene: not by sanitising, but by checking caps.
Mainly, that is called the “unfiltered_html” capability. With this cap, you can save anything (including malicious JS or else stuff) to your post content, and it will display/render as such in the front end. As soon an editor without that cap would re-save your post or attempt to save such content it would get stripped or escaped.
You can also read more about this on the_title WP Doc, or here Reporting Security Vulnerabilities – Make WordPress Core
Thus, that much is clear: escaping always no matter what is not the correct advice.
The thing is - instances like WP VIP for example are hard on with the statement “You must escape everything”.
Of course, they do not even mention that the_content is unescaped (see Validating, sanitizing, and escaping · WordPress VIP Documentation).
The problem I have now is, if I say in the DOC “you must escape always” then we contradict the very core code.
If I say “You must escape only when needed”, then I need to specify when you have to escape.
Technically, it is not safe to assume that even the_content
is “safe to not escape”.
This because it applies filters, and thus, a program can alter its contents even after you save it, not just an editor with unfiltered_html
caps.
And that would in turn mean you must escape even the the_content
.
Which then effectively would disallow you from putting any form
on your website, just as an example.
So I think the approach WP takes here is “If you have 5 holes in a wall and a flood comes, and you can patch 3 of them, you patch 3 of them, so you only have to worry about 2”.
However I would like to create a more useful doc than using phrases like the above.
I thought of this:
You should escape always as late as possible, but if you for some reason you cannot (like when outputting the_content
or when creating special outputs requiring special HTML), then you have the choice to not escape. In those cases, you must ensure that the input of said content is as safe as possible.
For example, if you where to accept user input thru a POSTed value by any user role (inclusive but not limited to guests), of course (apart of sanitisation) escaping is a must.
However if you where to work with a database stored value that only admins, or users with unfiltered_html can store, then you may not escape.
The rule of thumb is do escape always and as late as possible. If you cannot, make sure the input it echoes is produced by trusted users. Craft your code in a way that it is easy for any reviewer to immediately and without lot of effort reading long code to determine that indeed the input you echo is safe.
Do NOT exclude said lines with WPCS “ignore” statements when submitting a plugin for review, instead, add a clear comment to your code that helps the reviewer to follow the path to your input, to determine if indeed the content is safe.
What is everyone’s take on this?
Suggestions as of how to explain this without creating a whole lot of confusion but also without hiding the fact that core functions like the_content do not escape and that you do indeed have the possibility to use unescaped content?
I would very much like to hear @pluginvulns opinion on this as well, I assume you guys are aware of the unescaped the_content and doing plugin scans you will come across such situations then and when. Any suggestion in regard?