The following is a question from a VOSON user, submitted via email (edited):
"I did notice that links_to_ids was non-empty in our data, however I had specified NOT to look for inbound links (webmining parameters). In the crawl, I was exclusively interested in outbound links, so I was surprised to see anything in this field. Can you help me understand which parameters control whether or not inbound links will be discovered by the crawl?"
The answer was:
There will be sites with non-empty links_to_ids field because these are sites that have been discovered from the crawl of the seed sites. That is, if seed "i" links to "j", then there will be a row for "j" in the database, and the links_to_ids field for "j" will include "i" (as this is a site that is sending links to j). Note that seed sites may also have non-empty links_to_ids fields, if they receive links from other seeds.