• Page 1 of 1
  • 1
How can I avoid server replies "503 Too many requests"?
Charley_Dixon
Posts: 30
Reputation: 0

Message # 1 | 0:43 AM
Hi,

I'm trying to set up daily backups of some pages of my site (more specific: /index/*). I wrote a small script that fetches /sitemap.xml, compares timestamps with the ones stored from the previous run and if there is more recent version of a page /index/foo it fetches /pda/index/foo and stores the page and new timestamp.

The problem is that my site has over 100 pages, so short after I started my script the server started to reply with code "503 Too many requests" and I can't fetch pages anymore. The question is: what should be the minimal interval between sending requests to server to avoid such responses?

Also, is there some kind of XML API so that instead of saving /pda/index/foo I could for example request /xml/index/foo and get XML having only a page content (for example as CDATA)?

Natashko
Posts: 3366
Reputation: 171

Message # 2 | 12:47 PM
Charley_Dixon, This banning started recently in order to be protected from spam (in comments and forums) and site content copying first of all. Second of all we need to protect the system from the ignorant scripts which are a big load.

To avoid it: make sure that the website of yours doesn't contain the scripts that send requests to the server for dynamic pages automatically and more often than 1 time per minute. You may use from 1 to 2 request per minute (5 tops). In other cases, use Informers.

Charley_Dixon
Posts: 30
Reputation: 0

Message # 3 | 4:13 AM
Thank you Natashko, I'll try 1-2 requests per minute.

Added (2010-11-12, 10:13 PM)
---------------------------------------------
Just a few notes in case someone will read this thread:


  • The /sitemap.xml is not updated every time anyone saves a page, so you may not use modification times in that file to track changes on your site. You should download all pages. Moreover that file is not updated after you add/delete pages. So the best way to get a list of pages is to parse site menu.
  • There is an action log available at /panel/?a=log where you may find all required information: who, when, and what page. Making a script access that action log is left as an exercise for a reader smile
Post edited by Charley_Dixon - Saturday, 2010-11-13, 4:40 AM
  • Page 1 of 1
  • 1
Search: