@Simone and I have been chatting a bit about the Update Manager plugin and agree that it would be helpful to know where and what versions of our plugins are installed. I didn’t implement this initially as it seemed like it could be a GDPR concern. However, I’m not an expert at GDPR. I’m not sure if this data constitutes personally identifiable information (PII,) whether it would need controls to opt-out and/or remove data, etc. Any feedback and discussion is appreciated. To be clear, this isn’t planned, at this point – this is more of a discovery phase to see if this is even viable/permissible.
I am not a GDPR expert either, but I don’t see the IP address of the website and version of the plugin falling under personally identifiable information.
If you were logging the end users IP (a user of said website) then that would likely fall under personally identifiable information. Though, you could argue you are saving the IP information for “security purposes” meaning you can collect that data but have to document why, and how you will delete said data in a reasonable time frame.
It’s not the IP address of the server but the host name (exactly
get_bloginfo('url')) because of shared hosting.
As a fallback we can use the host name and save just it’s MD5.
Other info could be plugin version, PHP version, CP version, locale.
I would think saving the MD5 would be sufficient, but maybe @timkaye can offer some other insights
GDPR, the gift that keeps on giving
I haven’t followed this discussion so, for the moment at least, I can only address the specific point Wade raised.
I’m afraid the IP address of a website on which a plugin is being used definitely is personally identifiable information. GDPR includes info that, when connected to other info, can help identify someone.
Hashing the IP address might be enough, especially given the nature of this particular info; hashing and salting certainly would be, or using a hash stronger than md5.
Thanks for your thoughts, all. While I’m a tad annoyed with GDPR, I’m feeling sort of happy that I was at least on the right track in feeling this could be problematic. @timkaye, I’d like to follow up with a clarifying question, if I may.
Let’s say we irreversibly hash the URL and use that value as a key for storing the plugin’s version data. Can we use that same key to track the fact that a user has Plugin A and Plugin B installed at versions x.x.x and x.x.x, respectively? To put it more visually, is the following ok to store?
$stored_data = [ '2afh6s5l6jad7lf5a46ls6hg7ahg3lad6f2s62lka26dsflhas2d57lf22h3asf' => [ 'plugin-a/plugin-a.php' => '1.2.3', 'plugin-b/plugin-b.php' => '0.1.0', ], 'ka7lf5a46d6f2s62ldl2afh6s5l6js6hg7ahg3laa26dsfflhas2d57lf22h3as' => [ 'plugin-a/plugin-a.php' => '1.0.0', 'plugin-b/plugin-b.php' => '1.2.1', ] ];
This would give us the data we’re after (ie, which versions of which plugins are installed), without needing to know the installation locations. Also notable (perhaps) is that this would be designed to track and store version data about our own plugins that use the Update Manager for updates, not for tracking all the user’s plugin versions.
Yes, that would be fine to store.
You should also have a notice somewhere, explaining that you obtain the IP address purely for statistical purposes and store it in way that makes it essentially impossible to decode (even by you). You would not need to get the user’s consent for this, but you should take reasonable steps to make them aware that that’s what you’re doing.
If I want to find out the info for
widgets.example.com it’s trivial for anyone with the dataset to do so - it’s really not hiding anything at all.
Yes, you can’t just list all the sites, but I’m not at all convinced that’s PII in the first place.
Whether you’re convinced or not, GDPR is very clear on the point.
Would it be any different if the key was simply a randomly generated, unique string?
That certainly can’t be reversed, but at that point I’m not sure what the point is; all you have is a key and some info, and the list of keys will expand forever. You could aggregate the data over a period of time to get an estimate, but download stats are going to be more accurate than that with far less effort.
Points taken. Another thing occurred to me and hashing URLs won’t actually solve the problem.
Edit: I’ve decided to handle it differently.