A New VOSON Accounting System: From WCU to VAU

As you know, VOSON has always had a method of accounting for system resource use when running web crawls (which are the most user intensive tasks that VOSON performs). This method built upon a unit of account called the “WebCrawlUnit”, or WCU, that took into consideration the selected webmining parameters (such as maximum inbound and outbound links, page depth, etc.) to estimate the system resource use for each crawl.

While this system was effective at delineating between ‘small’, ‘medium’ and ‘large’ networks, it was an ad hoc approach—and in some cases the estimated WCU for a particular crawl (which is what was compared against a Tier’s available monthly WCU credit) was far off from the actual WCU expended.

The VOSON Activity Unit (VAU)

In order to tie down an accounting system that provides a more accurate estimate of system resource use, we introduced an approach that takes into consideration all available information about web crawls to form its estimate. We use a corpus of existing web crawl data (crawl parameters and system resource use) to form a statistically consistent estimate of resource use given the webmining parameters selected. This is an estimate of both memory and CPU usage of a particular crawl, which is given in units of ‘VOSON Activity Units’ or VAU (denominated in 100MB-sec, i.e. the product of memory use and CPU wall time in seconds).

Your Monthly Resource Use Allocation

We’ve worked hard to provide a better estimate of system resource use while at the same time keeping the VAU monthly allocation per Tier in line with the earlier WCU allocation per Tier. This means that users will be able to complete a crawl with roughly the same proportion of their monthly allocation of system resource use as before, but with increased confidence that actual system resource use will be more in line with estimated resource use.

Keeping It Fair: The Bare Minimum

Naturally, there are differences between a crawl using WCU as an accounting unit, and the same crawl using VAU as the accounting unit. This is because VAU is simply a more accurate measurement of resource use than WCU. But it might also be that in spite of this increased accuracy, the estimated VAU is still different from the actual VAU when a crawl is run.

To ensure that this difference doesn’t cause any unwelcome surprises, when you run a crawl, only the minimum of the VAU estimate and the actual VAU used is debited against your monthly allocation of VAU.

For example, suppose you set up a crawl to run, and the resource use estimate is 8 VAU. When the crawl is actually run, it uses only 6 VAU. Then you only pay for what you actually use, i.e. 6 VAU and not 8 VAU is deducted from your monthly VAU allocation. But if the estimate is only 4 VAU, and the actual VAU use is 6 VAU, you only pay the 4 VAU estimate. This eliminates the surprise of ‘bad news’, when you find that the estimated VAU is much lower than the actual crawl.

Continually Improving

Finally, each and every crawl’s webmining parameters and actual VAU use is used to improve the VAU statistical estimate. This means that future estimates will provide tighter and tighter bounds around the actual VAU expenditure, which means that any under- or over-estimation of the resource use of a crawl will be smaller. (Please note that we don’t use the content of your network in any way for this improvement—only the webmining parameters, such as the selected size of the inbound or outbound network, are used in conjunction with how long a particular crawl took.)

Let Us Know

We hope that the above discussion has explained both why and how the move to VAU occurred. If you have any questions or concerns, just let us know on the Community Forum.

 

-- The Uberlink Team