Quote:
Originally posted by Backov
Some technical perspective here:
A good idea, but you don't have the technology.
Your tech guy said you are hashing (md5ing) the images. A simple resize kills your review and creates a duplicate entry.
You know what else does? Adding a url. Cropping. Color correction. Slight recompression.
You are going to have a database of millions and millions of pics, a large percentage of which will be dupes.
Invest in some real tech (not md5 hashing) and try it with that. By real tech I mean that there's some very high level image recognition tech that could probably do it - but it's an investment.
I don't see this being worth your time to be quite honest.
Sorry for the downer, just MO.
|
As mentioned in a previous reply, you are correct, any single bit of manipulation of an image (other than renaming the file) will cause a different hash value.
Yes, it does mean there will be a database of millions of pics (there already is). The data will never be dupes...only the unique hash will exist. It is possible that a 1024x768 picture from a content provider, gets resized and manipulated 5 dozen ways, resulting in 5 dozen versions of the file, but this means that those 5 dozen versions of the file exist on the internet, therefore are viewable in a web browser.
So with sniffy's activities of being a good surfer, and not putting a load burden on a website, it will find these images, that to the human eye would appear the same, but at the bit level, are uniquely different.
Yes, it does mean lots of images to validate, but that's really our problem to deal with.
There is alot of technology that sits behind the suite of applications that process the data, and it's combined with alot of elbow grease to make everything work.
Pixel analysis in looking at flesh tones is probably what you are talking about using real technology..and i know there are companies out there pursuing that path.
In tests with tone detection as well as the keyword type blocking, there is still room for error. When a person validates an image as being "adult content" or "cp", given strict guidelines, it is 100% accurate. There are no false-positives that could exist, as it does today in anti-spam software, pixel flesh tone analysis, and keyword searching.
We could most certainly incorporate flesh tone analysis programs into our suite of tools, we certainly aren't beholden to our own applications, and this idea has already started to be explored. Anything that speeds up the task of providing 100% accuracy is in our best interests to explore.
thanks for your thoughts, they are appreciated in this discussion about trying to tackle the problem of CP on the web.
-dj