Memory problems/hung process RRS feed

  • Question

  • User-180720870 posted

    I am trying to run 500k URL's with the SEO Toolkit but it seems to always wind up failing on me.  It looks like that whatever is taking up the memory within the process, is not releasing it back to the OS for further use.  Anyone else having a problem like this?   Right now I am stuck trying to run a smaller set of URL's but I really need to be able to run as much as possible.  I am running with 7.5G RAM on EC2 and so far this is unusable for me.  Any help is appreciated.


    Sunday, May 2, 2010 3:57 PM

All replies

  • User-180720870 posted

    It looks like what it is doing is loading everything into RAM that is being downloaded.  The process is still running but just slow.  I have about 8.5 of downloaded content  which exceeds my available RAM.

    Sunday, May 2, 2010 4:06 PM
  • User-47214744 posted

    I would recommend dividing the analysis of your site into multiple areas, for example, if you have an area for say /blogs/ and one for /events/ then perform two analysis one by yoursite/blogs/ and one by yoursite/events. This way the results will be more manageable.

    I would not recommend running a single analysis that downloads more than 200,000 URLs since that will result in an extremely large set of violations that would anyway be really hard to fix. It is usually easier to tackle an area at a time.

    Monday, May 3, 2010 12:05 AM
  • User-180720870 posted

     Thanks for the response and the sane advice on tackling a subset instead of the whole site at once.

     And yes I had run it already just on 20k URL's and there was already too many violations to look at in a single setting so that make sense as well to run less.

     One more question.  Does it help memory management at all to either check or un-check the local copies of the pages?

    Monday, May 3, 2010 3:23 AM
  • User-47214744 posted

    That feature will only help in terms of disk storage since the pages will not be stored locally and could speed up the analysis since almost no Disk IO will occur, however it will not modify the behavior of the memory management.

    Monday, May 3, 2010 2:31 PM
  • User-180720870 posted

     What's strange is that all available RAM is eaten up by the crawler and the box becomes unresponsive even at only running 200k.  Maybe I need to try running less URL's?  I am also storing the pages locally which I am not sure would be causing this or not.  The amount of data downloaded exceeds the amount of RAM I am running with but since it's stored locally why chew though the memory?  I am running this on Ec2 so it would be easy enough for me to just start up a new x-large instance.  Running on Large right now.

    Monday, May 3, 2010 5:34 PM
  • User-47214744 posted

    The reason it takes so much memory is because we do keep in memory the relationship of links versus pages and their metadata. Note that we do not keep the contents (response) in memory, that is stored in disk if the "store pages locally" is checked. However the contents is rarely accessed, only when you click the Contents Tab or the Word Analysis Tab, but all the links, violations, metadata, etc, is kept in memory for performance reasons, but after 150,000 URLs or so it starts degrading.

    So bottomline my recommendation is to split your analysis by "sub-directory" when such large sites are to be analyzed. Even if you had the memory I would not recommend having such a large report anyway since the number of violations will be overwhelming and the performance on the Query engine would start to be annoying (waiting for 3 second queries to execute). The largest problem is that for every URL we keep things like Links and Violations and usually from my analysis the ratio is 1 URL = 45 Links = 13 violations, that means that for 150,000 URLs you will have about 6.7 Million links and about 2 million violations which take a lot of memory and time to query/filter.


    Monday, May 3, 2010 7:09 PM