Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Interchange: users

New Robot UAs

 

 

Interchange users RSS feed   Index | Next | Previous | View Threaded


justinl at fragrancenet

Nov 3, 2010, 1:09 PM

Post #1 of 14 (429 views)
Permalink
New Robot UAs

Everyone,

I've been going through Apache data lately and have discovered at
least two new robots
that ought to be added to robots.cfg:

1) 'bingbot'
http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx

Looks like Microsoft has changed the UA for MSNBot, but Interchange is
no longer recognizing it.

2) 'facebookexternalhit'
http://www.facebook.com/externalhit_uatext.php

This is really only relevant for webmasters that have done some sort
of facebook integration; ie. Like Button, Facebook Connect


We'll continue pouring over the robot data, we'll let you know if we
come up with anymore.

--
Regards,
Justin La Sotten
FragranceNet.com

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


racke at linuxia

Jan 7, 2011, 6:10 AM

Post #2 of 14 (325 views)
Permalink
Re: New Robot UAs [In reply to]

On 11/03/2010 04:09 PM, Justin La Sotten wrote:
> Everyone,
>
> I've been going through Apache data lately and have discovered at
> least two new robots
> that ought to be added to robots.cfg:
>
> 1) 'bingbot'
> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>
> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
> no longer recognizing it.
>

It should still recognize it, as the simple term "bot" is in the RobotUA list.
Neverless I added it.

Thanks
Racke



--
LinuXia Systems => http://www.linuxia.de/
Expert Interchange Consulting and System Administration
ICDEVGROUP => http://www.icdevgroup.org/
Interchange Development Team


_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Jan 15, 2011, 5:38 PM

Post #3 of 14 (315 views)
Permalink
Re: New Robot UAs [In reply to]

>> Everyone,
>>
>> I've been going through Apache data lately and have discovered at
>> least two new robots
>> that ought to be added to robots.cfg:
>>
>> 1) 'bingbot'
>>
>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>
>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>> no longer recognizing it.
>>
>
> It should still recognize it, as the simple term "bot" is in the RobotUA
> list.
> Neverless I added it.
>
> Thanks
>        Racke

SurveyBot is also not triggering the spider detection. Maybe the
"bot" entry only works if it appears after a delimiter in the UA?

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Jan 15, 2011, 5:42 PM

Post #4 of 14 (307 views)
Permalink
Re: New Robot UAs [In reply to]

>>> Everyone,
>>>
>>> I've been going through Apache data lately and have discovered at
>>> least two new robots
>>> that ought to be added to robots.cfg:
>>>
>>> 1) 'bingbot'
>>>
>>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>>
>>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>>> no longer recognizing it.
>>>
>>
>> It should still recognize it, as the simple term "bot" is in the RobotUA
>> list.
>> Neverless I added it.
>>
>> Thanks
>>        Racke
>
> SurveyBot is also not triggering the spider detection.  Maybe the
> "bot" entry only works if it appears after a delimiter in the UA?
>
> - Grant

Also AdsBot of "AdsBot-Google-Mobile
(+http://www.google.com/mobile/adsbot.html)".

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


racke at linuxia

Jan 16, 2011, 9:45 AM

Post #5 of 14 (315 views)
Permalink
Re: New Robot UAs [In reply to]

On 01/15/2011 08:42 PM, Grant wrote:
>>>> Everyone,
>>>>
>>>> I've been going through Apache data lately and have discovered at
>>>> least two new robots
>>>> that ought to be added to robots.cfg:
>>>>
>>>> 1) 'bingbot'
>>>>
>>>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>>>
>>>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>>>> no longer recognizing it.
>>>>
>>>
>>> It should still recognize it, as the simple term "bot" is in the RobotUA
>>> list.
>>> Neverless I added it.
>>>
>>> Thanks
>>> Racke
>>
>> SurveyBot is also not triggering the spider detection. Maybe the
>> "bot" entry only works if it appears after a delimiter in the UA?
>>
>> - Grant
>
> Also AdsBot of "AdsBot-Google-Mobile
> (+http://www.google.com/mobile/adsbot.html)".

That works for me (no session, $Session->{spider} == 1) with current
Interchange from Git.

Regards
Racke


--
LinuXia Systems => http://www.linuxia.de/
Expert Interchange Consulting and System Administration
ICDEVGROUP => http://www.icdevgroup.org/
Interchange Development Team


_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Jan 16, 2011, 9:52 AM

Post #6 of 14 (305 views)
Permalink
Re: New Robot UAs [In reply to]

>>>>> Everyone,
>>>>>
>>>>> I've been going through Apache data lately and have discovered at
>>>>> least two new robots
>>>>> that ought to be added to robots.cfg:
>>>>>
>>>>> 1) 'bingbot'
>>>>>
>>>>>
>>>>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>>>>
>>>>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>>>>> no longer recognizing it.
>>>>>
>>>>
>>>> It should still recognize it, as the simple term "bot" is in the RobotUA
>>>> list.
>>>> Neverless I added it.
>>>>
>>>> Thanks
>>>>        Racke
>>>
>>> SurveyBot is also not triggering the spider detection.  Maybe the
>>> "bot" entry only works if it appears after a delimiter in the UA?
>>>
>>> - Grant
>>
>> Also AdsBot of "AdsBot-Google-Mobile
>> (+http://www.google.com/mobile/adsbot.html)".
>
> That works for me (no session, $Session->{spider} == 1) with current
> Interchange from Git.

OK thanks for testing. It must be because I'm on 5.6.3.

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Dec 29, 2011, 4:49 PM

Post #7 of 14 (122 views)
Permalink
Re: New Robot UAs [In reply to]

> Everyone,
>
> I've been going through Apache data lately and have discovered at
> least two new robots
> that ought to be added to robots.cfg:
>
> 1) 'bingbot'
> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>
> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
> no longer recognizing it.
>
> 2) 'facebookexternalhit'
> http://www.facebook.com/externalhit_uatext.php

facebookexternalhit should be added to robots.cfg. Here is Facebook's
explanation of how it's used:

"Facebook allows its users to send links to interesting web content to
other Facebook users. Part of how this works on the Facebook system
involves the temporary display of certain images or details related to
the web content, such as the title of the webpage or the embed tag of
a video. Our system retrieves this information only after a user
provides us with a link. You may have found this page because a
Facebook user sent a link from your website to other Facebook users.
If you have any questions or concerns about any links or content sent
by one of our users, please contact us at legal [at] facebook"

http://www.facebook.com/externalhit_uatext.php

Facebook itself retrieves the external page or image with the
facebookexternalhit UA so that UA shouldn't be given a session.
Should I submit a bug for this?

- Grant


> This is really only relevant for webmasters that have done some sort
> of facebook integration; ie. Like Button, Facebook Connect
>
>
> We'll continue pouring over the robot data, we'll let you know if we
> come up with anymore.
>
> --
> Regards,
> Justin La Sotten
> FragranceNet.com

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Dec 29, 2011, 5:03 PM

Post #8 of 14 (120 views)
Permalink
Re: New Robot UAs [In reply to]

>> Everyone,
>>
>> I've been going through Apache data lately and have discovered at
>> least two new robots
>> that ought to be added to robots.cfg:
>>
>> 1) 'bingbot'
>>
>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>
>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>> no longer recognizing it.
>>
>
> It should still recognize it, as the simple term "bot" is in the RobotUA
> list.
> Neverless I added it.
>
> Thanks
>        Racke

If bingbot in robots.cfg doesn't change any behavior, I suggest it is removed.

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


jon at endpoint

Dec 29, 2011, 5:12 PM

Post #9 of 14 (121 views)
Permalink
Re: New Robot UAs [In reply to]

On Thu, 29 Dec 2011, Grant wrote:

>>> I've been going through Apache data lately and have discovered at
>>> least two new robots that ought to be added to robots.cfg:
>>>
>>> 1) 'bingbot'
>>>
>>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>>
>>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>>> no longer recognizing it.
>>
>> It should still recognize it, as the simple term "bot" is in the
>> RobotUA list. Neverless I added it.
>
> If bingbot in robots.cfg doesn't change any behavior, I suggest it is
> removed.

I disagree: I think it's better to explicitly add bingbot even if bot is
already there, because later someone thinking "bot" is overly broad would
be likely to remove it with no idea that they're now making "bingbot" fail
to be recognized as a robot.

Jon

--
Jon Jensen
End Point Corporation
http://www.endpoint.com/

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


jon at endpoint

Dec 29, 2011, 5:14 PM

Post #10 of 14 (120 views)
Permalink
Re: New Robot UAs [In reply to]

On Thu, 29 Dec 2011, Grant wrote:

> Facebook itself retrieves the external page or image with the
> facebookexternalhit UA so that UA shouldn't be given a session. Should I
> submit a bug for this?

By far the easiest thing for me and I suspect several other developers is
for you to fork the interchange repository on GitHub:

https://github.com/interchange/interchange

Then make the change, and then email the list here with your commit URL.
It's then extremely simple for us to commit your change to the main
repository.

If you don't want to do that, a patch emailed to the list is great too.

Thanks,
Jon

--
Jon Jensen
End Point Corporation
http://www.endpoint.com/

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Dec 30, 2011, 4:04 PM

Post #11 of 14 (123 views)
Permalink
Re: New Robot UAs [In reply to]

>> Facebook itself retrieves the external page or image with the
>> facebookexternalhit UA so that UA shouldn't be given a session. Should I
>> submit a bug for this?
>
>
> By far the easiest thing for me and I suspect several other developers is
> for you to fork the interchange repository on GitHub:
>
> https://github.com/interchange/interchange
>
> Then make the change, and then email the list here with your commit URL.
> It's then extremely simple for us to commit your change to the main
> repository.
>
> If you don't want to do that, a patch emailed to the list is great too.
>
> Thanks,
>
> Jon

Got it, I'll submit patches to the list until I learn how to use git.

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Dec 30, 2011, 4:11 PM

Post #12 of 14 (121 views)
Permalink
Re: New Robot UAs [In reply to]

>>>> I've been going through Apache data lately and have discovered at least
>>>> two new robots that ought to be added to robots.cfg:
>>>>
>>>> 1) 'bingbot'
>>>>
>>>>
>>>> http://www.bing.com/toolbox/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx
>>>>
>>>> Looks like Microsoft has changed the UA for MSNBot, but Interchange is
>>>> no longer recognizing it.
>>>
>>>
>>> It should still recognize it, as the simple term "bot" is in the RobotUA
>>> list. Neverless I added it.
>>
>>
>> If bingbot in robots.cfg doesn't change any behavior, I suggest it is
>> removed.
>
>
> I disagree: I think it's better to explicitly add bingbot even if bot is
> already there, because later someone thinking "bot" is overly broad would be
> likely to remove it with no idea that they're now making "bingbot" fail to
> be recognized as a robot.
>
> Jon

By that logic, should robots like AdsBot and SurveyBot be added to robots.cfg?

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


jon at endpoint

Dec 30, 2011, 4:20 PM

Post #13 of 14 (120 views)
Permalink
Re: New Robot UAs [In reply to]

On Fri, 30 Dec 2011, Grant wrote:

> By that logic, should robots like AdsBot and SurveyBot be added to
> robots.cfg?

If you've confirmed they're real bots visiting your site, yes, I think so.

Jon

--
Jon Jensen
End Point Corporation
http://www.endpoint.com/

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users


emailgrant at gmail

Jan 1, 2012, 1:56 PM

Post #14 of 14 (118 views)
Permalink
Re: New Robot UAs [In reply to]

>> By that logic, should robots like AdsBot and SurveyBot be added to
>> robots.cfg?
>
>
> If you've confirmed they're real bots visiting your site, yes, I think so.
>
>
> Jon

Ok, I understand the concept then. I'll put together some changes
into a patch and post here.

- Grant

_______________________________________________
interchange-users mailing list
interchange-users [at] icdevgroup
http://www.icdevgroup.org/mailman/listinfo/interchange-users

Interchange users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.