How to block "Uknown robots identified by any ^&^%$#% existent"?

Where are you seeing this? Find your actual access logs on the server and find the bots IP there and just block it on the server via htaccess/nginx.conf. Logging plugins are generally a bit crap.

1 Like

At the awstats. It is the bot that makes more hits than I do on my own website.

Can you show us a screenshot from AWStats of what you’re referring to? So we can be on the same page. You can redact any info you don’t want shown publicly, like the number of visitors.

Of course. I don’t think that I will violate the privacy of this damn bot;s privacy if I post the stats! lol

Here it is: First in the list of bots that crawl the website. These are the stats of October and this damn think has already consumed 707 mb of bandwidth. I can’t find its ip and nothing seems to slow it down.

I have to revise my original reply, I noticed something in the robot’s database.

Unknown robot identified by bot\* is identified using this pattern:
bot[\s_+:,\.\;\/\\\-]

It does group multiple bots based on this pattern. For example, bots wit the following names in user-agent would be put into this group:

bot_name
bot+name
bot:name
bot,name
bot.name
bot;name
bot/name
bot\name
bot-name
bot name
bot

You can try searching for bot+ or bot-, etc. in your logs. You can ignore name, that’s just to show that the pattern can be anywhere in the user-agent - beginning, middle or end.

Robots database:

And what does that mean?
I don’t understand what is your point. Do you mean that it groups various different bots but it shows them also as individual bots- is this what you mean?

The Unknown robot identified by bot\* is a group of bots that share similar names based on a specific pattern.

For example, “bot+robotName”, “123bot-SuperBot”, would be grouped together because first bot contains bot+ and second bot contains bot-. They both would have different IPs.

So Unknown robot identified by bot\* could contain hundreds or even thousands of different bots because they all share similar keywords in their names.

I hope that clarifies it a bit.

Holly Sh*t! (recorded at the awstats because it contains an asterisk as well ) :laughing:

Ok I got your point… there is no salvation…

But there is one. To block all the bots from .htaccess and allow only the well known ones.
Does this count as a solution? ( though I don’t know how to do it)

Unfortunately, you can’t block all bots. That’s why AWStats has a database to identify them. They act as a normal visitor would, so the only way to identify a bot is to know its name or guess it based on some common word.

However, it might be possible to use this pattern to block robots in htaccess:
bot[\s_+:,\.\;\/\\\-]

I did a quick check, and it is possible to block bots using that pattern.

BIG WARNING: This also blocks Googlebot, Bingbot, and any other good bots that contain “bot”. So I wouldn’t recommend using it.

Adding this to your .htaccess, somewhere below RewriteEngine On should do the trick:

RewriteCond %{HTTP_USER_AGENT} bot[\s_+:,\.\;\/\\\-] [NC] 
RewriteRule .* - [F,L]

You can test it using https://httpstatus.io/ and select “Googlebot” from the User Agent dropdown. Enter your domain and click “Check status” If correctly implemented, you should see status code 403.

Then you can switch to the “Your Browser” user agent, and the status code should be 200.

You could temporarily add it to your .htaccess for a day or two, to see if that affects AWStats.

But remember, it will block good bots too.

Thank you for the suggestion but I have already done it and it didn’t do the trick. I tested it for a couple of days and didn’t do anything.

It seems that not everything that we read online is valid.

Actually, it may have worked at blocking bots. I didn’t think about this before, but even when htaccess blocks incoming request it still records the request in the log. I don’t know how AWStats parses logs, but it’s possible they are “seeing” bots that were blocked by htaccess.

For Example

I sent a request to my website as “Superbot”. You can see in the screenshot #1 is the first line, which is the request I sent. And the #2 is the response from my server. It denied request using 403 status code.

Now, when I looked at my access.log, I can still see the request:
[15/Oct/2020:01:51:19 -0400] "GET / HTTP/1.1" 403 3577 "-" "Superbot"

To Summarize

Blocking in htaccess doesn’t prevent bot traffic from being recorded in the log. So AWStats would be inaccurate if you’re blocking bots using htaccess.

2 Likes

I see. Thank you very much for the explanation.
I seems that there is not solution because in the first case I lose all the bots and in the second case I lose the stats.

The solution would be to analyze the logs directly (for example in Excel) and identify the specific User-Agent header values and/or IP addresses to block. awstats is great to provide summaries but in this case it happens to be hiding the information you need.

1 Like

FYI @Marialena.S

I just checked one of my bigger sites and these are the bot stats for last month…

So, over 6,000 hits from unknown bots. It’s not something I have ever really worried about.

6 Likes

1 Like