Would it be wrong? ...
 

  You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more

[Closed] Would it be wrong? (web data retrieval Q)

8 Posts
7 Users
0 Reactions
95 Views
Posts: 13916
Free Member
Topic starter
 

I am after contact data for a certain type of company.
There is an official (i.e. governing authority) web site that allows you to search for these companies and retrieve the contact data, although the search page only returns 100 results each time. This could lead to missed contacts (and duplicates but I can deal with that in other ways).

I know how the link to each piece company data is created and I have recreated this in a database (basically a URL including a unique number). So now I have 500,000 web links (most of which will not work but a some will) that my scraper can crawl though and retrieve the data without me carrying out a lot of manual searches to get 100 links each time.

So, given that the contact data is publicly available elsewhere (just a PITA to collect) would I be a bad boy if I clicked 'go'?


 
Posted : 28/06/2013 10:06 am
Posts: 251
Full Member
 

screen scraping is a regular thing for a lot of price consolidation/comparison sites.

If someone doesn't want it to happen they should code to prevent it.


 
Posted : 28/06/2013 10:08 am
Posts: 2
Free Member
 

I would schedule the collection process. Hammering their website with half a million hits might not feel like a very nice thing to happen to them.

I know if one of my web sites got half a million hits from one IP address I'd treat it as an attack and make an effort to track down the perps. In fact, some of our websites would block the traffic quite quickly as they'd deem it a DOS attack.


 
Posted : 28/06/2013 10:23 am
Posts: 13916
Free Member
Topic starter
 

Yep I was going to leave the start until tonight.

I know if one of my web sites got half a million hits from one IP address I'd treat it as an attack and make an effort to track down the perps

I did think about this, but what would they do then? I'm only retrieving data that they make available anyway.
I may not bother as 500,000 hits isn't going to cover it really.


 
Posted : 28/06/2013 10:35 am
Posts: 2
Free Member
 

Depends who they are and what resources they have at their beck and call really ;-). My guys would be chomping at the bit to make a retribution hit for this sort of behaviour. (only joking)

As I say though, if you spread it out they probably wouldn't be too bothered but the last thing you need is a complaint raising with your ISP or even that they blacklist you.


 
Posted : 28/06/2013 10:43 am
Posts: 0
Full Member
 

Hire a botnet for an evening?


 
Posted : 28/06/2013 10:59 am
Posts: 5686
Full Member
 

What would they do? They'd block or rate limit your IP, so if it is a work IP or similar then it could have a negative impact on other people being able to work against that site should they need to.

Keep it under control and I doubt you'll have a problem getting all the data.


 
Posted : 28/06/2013 11:27 am
Posts: 3544
Free Member
 

I did think about this, but what would they do then? I'm only retrieving data that they make available anyway.

Big difference from getting a bit of data for you to book one holiday and hitting the same site 50,000 for all their holiday data, for example.

I've seen sites drop in a CAPTCHA blocker after a certain number of hits from the same IP so you might not get all the data you were expecting.


 
Posted : 28/06/2013 11:41 am
 Aidy
Posts: 2941
Free Member
 

Does the website in question have terms of use?


 
Posted : 28/06/2013 1:05 pm

6 DAYS LEFT
We are currently at 95% of our target!