Quote:
Originally posted by SpaceAce
I wasn't going to say anything, but OK.
9K is nothing. I can have 50,000 working proxies by this weekend without even breaking a sweat. It's no trick to scan several thousand per day on DSL and there are plenty of sites out there with working proxy lists updated daily. They're all free, too. If you add in certain pay services, the list gets even bigger.
|
I doubt the 50K number. This isn't a personal attack, I'm just speaking from experience at personally running a proxy database for more than a year. We spider a lot of the free sites, as well as doing a bit of probability-based scanning (e.g. concentrated probes where we have multiple proxies in the same /24, and test probes where we have multiple proxies in the same /16), and testing inbound connections for proxies. I ran this off my cablemodem for 10 months, so indeed, DSL would have no problem scanning thousands. Now that we're sitting on 10Mbps we've improved our scanning drastically.
Quote:
|
I don't know either of the sites you mentioned in your post, but I do know of others who update often and have actual working proxies. Like I said, though, even raw scanning will produce tons of open proxies, most of which will probably not be on your 9K list. When you factor in how fast proxies die and fresh ones get spread around, it's pretty much an impossible task to keep up.
|
I would agree that some of the proxies you'd find wouldn't be in our list - no list is comprehensive, and no list ever will be. However, I'd like to speak on the idea of how fast proxies come and go. In particular I'd like to counter with a few things from our internal stats page:
The oldest open proxy in the database is 194.78.zzz.240:80 (added 393 days ago).
The oldest anonymous open proxy in the database is 194.78.zzz.240:80 (added 393 days ago).
...all proxies aren't as transient as you may think...
Code:
Total:
There are currently 25144 records in the database, including both
tracked and untracked proxies.
Tracked:
9459 proxies are open (1798 of those are anonymous), 13371
are closed, and 141 are in an unknown state due to a connection
timeout.
Untracked:
There are 2169 proxies which have been marked as duplicates
(transparent proxies which point at another proxy, etc.) and 3
proxies have been barred from future probing by administrative request.
1 proxy was found to be operated by the US government or
military without reverse DNS, and manually blocked from further
probing. This proxy would never have been probed, scanned, or
added to the database to begin with if proper reverse DNS had
been in place.
...though they certainly are transient to an extent. Point being, you could give me a list of a billion supposed proxy IPs, it's worthless unless it's scanned repeatedly every day. Rosinstruments' site (and many other free proxy sites) are good examples of "big proxy lists" gone awry, where not only do many of the IPs listed not actually proxy HTTP connections, they aren't tested routinely enough to make the list worth anything. Even if you could come up with 50K proxies by this weekend, 40K of them would be dead by the time you finished compiling the list. And there's the rub. No other site probes as frequently as we do, no other site lists as many real, live, verified proxies as we do. I've been looking for viable competition for 393 days and I haven't found it yet.
I wish I had some way to determine the number of hosts probed (from hundreds of sources, including both free and paid proxy lists) vs the number which were actually open proxies at any given time. Unfortunately we don't keep track of hosts tested which were not open proxies on a historical basis. However I can say from looking at the stats from the most recent spidering run - which encompasses a variety of sites -
$ wc -l *list.txt
17429 badlist.txt
237 goodlist.txt
That is, out of nearly 18,000 "proxies" listed at various sites, less than 250 of them were actually open proxies. That's a pretty fucking bad track record in my opinion. OpenProxies.com's scanners fire up every 2 hours (probing the entire list of opens takes longer than that, so they overlap) but you're rarely going to see an IP in our list which was not open 12 hours ago or sooner. It's difficult to find such accuracy elsewhere unless you do it yourself - and that's the whole idea, not everyone can do it themselves. If you can, it doesn't bother me in the least.
Quote:
|
A service like yours is an OK place to start, but it isn't a miracle cure. For the record, the most verified working proxies I have ever had at one time is about 140,000. I will note, though, that about 1/2 - 2/3 of them were not <B>really</B> anonymous (spilled IPs), and many of them were sequential (several proxies running on one server or group of servers), but still...
|
I find it difficult to believe that you had 140K valid, working open proxies at a time. Again, this is not a personal attack, I don't know you so I can't argue. It's merely a reflection of the facts I've seen. I refer back to atomintersoft who claims to have 200K+ proxies, most of which either are not
public proxies, or were never proxies to begin with; and their number has been going up for
years - if someone had 140K at a time, wouldn't there be freelists out there with that many, or more?
I think to the days of cyberarmy.com's proxy list, where their "tester" would report positive if a given host was responding on the speficied port. There was no actual attempt made to use the proxy. There are many such so-called "proxy lists" made up of IPs which are nothing more than webservers which don't actually relay HTTP requests.
Quote:
|
The last time I participated in a discussion like this on a message board my ICQ went nuts with people wanting proxies.
|
In other words, the majority of people
can't find 50,000 proxies on their own by this weekend, eh?
Anyway, I ain't saying you're BSing, just saying I have a tough time believing your numbers, and an even tougher time believing that those numbers could possibly be recent.