[MKDoc-dev] Setting the User-Agent field for RSS GET requests

Chris Croome chris at webarchitects.co.uk
Thu Feb 2 14:38:18 GMT 2006


Hi

Google News blocks wget and LWP from making RSS requests -- try getting
these feeds in a browser and wget:

  http://news.google.com/news?q=sheffield&ie=UTF-8&output=rss

The User-Agent field can be set for LWP:Simple [1] and I guess having a
env var for this would make sense, and it could be set to something
sensible by default, eg "MKDoc 1.6 RSS fetcher" and then it would also
be easy to change it's if this doesn't work for some sites...

For now adding a line like this to tools/cron/031..rss_routine.pl and
tools/cron/030..rss_troubleshooter.pl does the trick 

  $ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)');

Chris

[1] http://search.cpan.org/~gaas/libwww-perl-5.805/lib/LWP/UserAgent.pm

-- 
Chris Croome                               <chris at webarchitects.co.uk>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   


More information about the MKDoc-dev mailing list