Voson team.

In checking my data, I noticed that one URL was listed as having no indegrees and no outdegrees, when, in fact it should. This is a URL I 'preserved' and I wonder if the act of preserving limits the Voson crawl to *just that subdirectory* or if there is some other reason the links were not picked up?

The *Preserved* url is: http://www.geocities.ws/evanwolfson/

However, there are links to other seed orgs in my database just a little further down the directory that were not picked up by Voson (see: http://www.geocities.ws/evanwolfson/ftm_links.htm ).

I read the userguide on preserving and it seems like I might need to add something to the end of the directory, but I am unclear about it.

Thanks,

Michael

  • anon

    Hi,

    I will have to look into this case to see if the crawler successfully visited this web page.

    Normally, I do the node pre-processing (e.g pagegrouping, preserving etc.) after the crawl has completed. If you did the preserving *before* the crawl then it may have impacted on the crawler behaviour, but I can't immediately see why.

    So I will check to see what links were extracted from this seed page and whether preserving is having an unintended impact on the links to other sites.

    Rob

    Dec 31, 2012
  • anon

    I preserved the url after the crawl.

    Well, sort of. - I *purged* a similar url with the same root directory (but incorrect sub directory) and then *added* this correct version as a seed site. Then, after I was notified that the seed was crawled, I went back and preserved this corrected url (with its sub directory).

    Thanks,

    M

    Jan 03, 2013
  • anon

    This was looked into and as mentioned in email correspondence with OP, it was established that the seed site had in fact been correctly crawled.

    So, this problem is closed.

    Rob

    Jan 16, 2013