Don’t forget to block access to your uploads and RSS feed. Here’s some code you can place in your .htaccess file (anywhere above the ClassicPress block):
In fact, it’s like picking up the sea with a teaspoon. You’re on the internet, and there’s no way to protect your content 100%. There are ways to limit their visibility with restricted areas, passwords etc. But crawlers and spiders arrive always and everywhere, because robot.txt directives are respected according to convenience. Furthermore, scraping bots are violent and pay no attention to anyone, especially those who should be at the limit of legality.
There is no way to have your cake and eat it too. In other words: music authors have been fighting against the digital smuggling of their music for thirty years now. But it’s a losing fight from the start, and many of them eventually put their oars back in the boat and adapted. Not to mention the authors of fonts or software.
So, as frustrating as it is, this is the situation.
Rather, rather than worrying so much about something that you can’t actually fight (at least with current technologies), images are still a source of income. So, to say: if you have good, marketable artwork, you might as well put these images up for sale on your website, so that a potential person interested in using the image for a t-shirt or ebook cover , rather than stealing it and using it illegally, he is induced to buy the license to use it. By using some precautions, conditions of use and a competitive price, you can greatly limit illegal use. It doesn’t eliminate it, but you can compensate with an additional area of business: downloading digital products.
We’re telling you: it’s a losing battle with current technologies. Instead, really think about digitizing your products, to expand your business. Many honest people, rather than stealing your image for a t-shirt or CD cover, prefer to purchase the rights, downloading the image legally. You can use many digital product sales platforms (excluding Deviant Art) for this purpose, or you can set up an e-commerce site (e.g. Classic Commerce) and sell everything directly on your site.
Just putting my two cents in here…
I will not go into the copyright infringements by AI (as we all seem to agree on that), but I would like to comment on the technical (server) part.
Adding “Disallow” to the robots.txtonly works for legit bots and crawlers (Bingbot, Googlebot, DuckDuckGoBot, Ahrefs, DotBot, etc), because these actually call the robots.txt file first to check your permission before crawling the rest of the site. Malicious bots do not call/visit/read the robots.txt file, so you will need to block those in your .htaccess root file.
Aha, found it! There is a double semi-colon (|) behind LabyrinthBot.
Also, the WordPress scraper BuiltWith bot states on their website that they use BuiltWith as User Agent, but my log files show that is incorrect. They use BW/1.2 and an URL ending with oupwis. So I added that.
What I did not take into account when posting my first reply, was that my hosting provider has ModSecurity installed on the LiteSpeed server which runs several security checks, but often too rigorously…
Yes, it works. Now let’s see if it will produce results. However, I would like to point out that I have a fairly powerful firewall that rejects any attempt at illegal access. Thank you for sharing it. I have several, but this one looks very good.
As for the robot.txt. I have blocked a number of legal bots, but it is also interesting to do the opposite. Establish who can enter and who cannot.
I took a look at the plugins. Interesting. But I wouldn’t know to what extent they are compatible with my firewall and my other htaccess rules. I’m trying to use snippets suggested by Jeff Star now. Now I see what happens. For the moment, I’m noticing that my firewall no longer logs me any attempted XSS attack. This is probably due to the fact that attempts are blocked at the htaccess level, and are no longer intercepted by the firewall.
@iljester Do you mind if I ask which firewall you use?
I use WordFence, but obviously some settings related to core files are not active when using CP.
I wrote to you privately (I only give you some information in private). I’m just telling you that my site is constantly under attack. While this is almost normal for a WP-based website, it’s my fault for attracting rubbish because of the links to my site in the credits in the themes I published on WP.
Now I’m trying to fix it with theme updates.
Tip: if you publish a theme, don’t link directly to your websites!
Thanks for the info.
Any of my sites that have been public for awhile are subjected to attacks on a daily basis as well, I am not sure there is even a realistic way to stop them.