1. concerns copyright and ethics approval. How do these impact on the use of voson software?
2. It is possible to open the URL in the testAN data base, or only in the original database?
thanks,
Rose
Dec 15, 2015
rob
Hi,
Thanks for your questions.
Regarding ethics approval for using web crawlers, there are a couple of relevant points here.
The main ethical concerns that have been raised in the literature relate to:
(1) Crawling a website can potentially use a lot of the resources (e.g. bandwidth, CPU time) of the website owner. Web crawlers need to be used responsibly, for example by not excessively crawling the sites of organisations that might be resource-constrained (e.g. NGOs in developing countries). This is something that the user of the VOSON crawler would need to be aware of i.e. are you crawling the site of an organisation that might be resource constrained and hence materially inconvenienced by the crawl? Also the crawler needs to be designed so there are delays between each page request. The VOSON crawler does have significant delays between page requests.
(2) It is important that web crawlers obey the robots.txt protocol, which is used by webmasters to inform crawlers which parts of the website can be crawled and which parts are ‘off limits’. The VOSON crawler obeys robots.txt.
(3) There are other issues that are relevant for all research involving digital trace data, not just research using Web 1.0 web crawlers e.g. (a) informed consent, (b) the distinction between private and public individuals, and (c) participant anonymity. Regarding 3a, I would say it is basically impossible to gain informed consent in research involving large scale web crawls. Regarding 3b, I believe this is less of an issue for Web 1.0 research, compared with Web 2.0 research: with Web 1.0 you are more likely to be identifying organisations or groups rather than individuals. Regarding 3c, my opinion is that participant anonymity is not a factor for Web 1.0 research *unless* you are attaching labels/attributes to network nodes that could be problematic. I would be loathe to to publish research on, for example far-right or extremist websites, where these websites are identified by URL because (a) my labelling might be problematic (a website owner might be concerned about and even damaged by being given a particular label) and (b) research where network nodes are labelled with the URLs would effectively contribute to a "league table" of hate websites, which would be undesirable.
Regarding copyright issues, I would say that it is not a problem unless you are planning on publishing vertbatim text or images from websites. VOSON is a tool for collecting hyperlink network and website text content, but it is not expected that the text content collected by VOSON would be used for qualitative research involving quotation of sentences or paragraphs from websites.
Moving on to your question about opening a URL in a "voson" database, compared with a "voson-analysis" database. With voson-analysis databases, in the DataBrowser, the URL column shows the URL of the first page in the pagegroup, where first means: it is the first page from that pagegroup that was collected by the crawler. So if I crawl voson.anu.edu.au and the crawler first encounters voson.anu.edu.au/page1.html, then that will be shown the URL column in the DataBrowser: it will be effectively the representative URL of all the pages from that pagegroup (which generally means: hostname) that have been collected.
The Pagegroup column shows the label that is used to refer to all pages in the pagegroup (generally, it is the hostname, so in the above example it would be "voson.anu.edu.au").
So, I guess the short answer to your question is: yes, it is possible to open a URL in the testAN (voson-analysis) databse, directly from the DataBrowser, but the URL will only be for one of the pages collected from that pagegroup, and it may not be representative (it is just the first page that was encountered by the crawler).
Regards,
Rob
References (slightly self indulgent):
Ackland, R. (2013), Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age, SAGE Publications.
Hello there,
2 quick questions.
1. concerns copyright and ethics approval. How do these impact on the use of voson software?
2. It is possible to open the URL in the testAN data base, or only in the original database?
thanks,
Rose
Hi,
Thanks for your questions.
Regarding ethics approval for using web crawlers, there are a couple of relevant points here.
The main ethical concerns that have been raised in the literature relate to:
(1) Crawling a website can potentially use a lot of the resources (e.g. bandwidth, CPU time) of the website owner. Web crawlers need to be used responsibly, for example by not excessively crawling the sites of organisations that might be resource-constrained (e.g. NGOs in developing countries). This is something that the user of the VOSON crawler would need to be aware of i.e. are you crawling the site of an organisation that might be resource constrained and hence materially inconvenienced by the crawl? Also the crawler needs to be designed so there are delays between each page request. The VOSON crawler does have significant delays between page requests.
(2) It is important that web crawlers obey the robots.txt protocol, which is used by webmasters to inform crawlers which parts of the website can be crawled and which parts are ‘off limits’. The VOSON crawler obeys robots.txt.
(3) There are other issues that are relevant for all research involving digital trace data, not just research using Web 1.0 web crawlers e.g. (a) informed consent, (b) the distinction between private and public individuals, and (c) participant anonymity. Regarding 3a, I would say it is basically impossible to gain informed consent in research involving large scale web crawls. Regarding 3b, I believe this is less of an issue for Web 1.0 research, compared with Web 2.0 research: with Web 1.0 you are more likely to be identifying organisations or groups rather than individuals. Regarding 3c, my opinion is that participant anonymity is not a factor for Web 1.0 research *unless* you are attaching labels/attributes to network nodes that could be problematic. I would be loathe to to publish research on, for example far-right or extremist websites, where these websites are identified by URL because (a) my labelling might be problematic (a website owner might be concerned about and even damaged by being given a particular label) and (b) research where network nodes are labelled with the URLs would effectively contribute to a "league table" of hate websites, which would be undesirable.
Regarding copyright issues, I would say that it is not a problem unless you are planning on publishing vertbatim text or images from websites. VOSON is a tool for collecting hyperlink network and website text content, but it is not expected that the text content collected by VOSON would be used for qualitative research involving quotation of sentences or paragraphs from websites.
Moving on to your question about opening a URL in a "voson" database, compared with a "voson-analysis" database. With voson-analysis databases, in the DataBrowser, the URL column shows the URL of the first page in the pagegroup, where first means: it is the first page from that pagegroup that was collected by the crawler. So if I crawl voson.anu.edu.au and the crawler first encounters voson.anu.edu.au/page1.html, then that will be shown the URL column in the DataBrowser: it will be effectively the representative URL of all the pages from that pagegroup (which generally means: hostname) that have been collected.
The Pagegroup column shows the label that is used to refer to all pages in the pagegroup (generally, it is the hostname, so in the above example it would be "voson.anu.edu.au").
So, I guess the short answer to your question is: yes, it is possible to open a URL in the testAN (voson-analysis) databse, directly from the DataBrowser, but the URL will only be for one of the pages collected from that pagegroup, and it may not be representative (it is just the first page that was encountered by the crawler).
Regards,
Rob
References (slightly self indulgent):
Ackland, R. (2013), Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age, SAGE Publications.
http://www.sagepub.com/books/Book234439
http://www.amazon.com/Web-Social-Science-Concepts-Scientists/dp/1849204810
Thanks for the reply, Rob. The information is invaluable. And the book is excellent.
Rose