![]() |
Clearly a way better and cheaper solution that FSC/APAP fingerprinting approach.
Great post borked! |
Quote:
The OP was designed at *preventing* content ever being able to get uploaded. Then it evolved to involve injecting use data to track the pirate down, then it evolved to finding copyrighted content to help automate the process in finding copyrighted content already out there. The FSC solution is to prevent content getting onto those tube sites that are a part of the circle, which I'm not too hot about since that leaves the others free to do as they please. Let's just say they are complementary approaches to the same goal. My goal is to evolve the thread to be able to detect movies, but that I fear may never happen outside of a "proof of concept" sandbox due to the bandwidth involved, but hey, I can surprise myself sometime :upsidedow All this knowledge btw is already out there in forms of scientific papers who have shown ways to detect identical/similar images/videos. The hard part is implementing it in a real-world cost-effective situation. |
Quote:
I just posted that in sheer frustration when I was tired and thinking to myself "why am I even bothering with all this frikken complicated shit?" damn time-sensitive edit gfy button! |
Quote:
|
Quote:
and yes, I've sped things up by 1000x this is having pre-hashed all your images (which is kinda slow at ~0.1/sec, but new images can be easily added to the db, just the initial compile is going to be slow) and then comparing 1 image against this hash db: Quote:
About as good as it's going to get :winkwink: --edit if someone wants to verify that calculation of images/hr based on intval( 3600 / ( (1 / 55) * $time ) cos my brain is really fried! |
to give an idea of size, hash db is 8.0KB for a directory of 16MB images (55 in total). Can't get better than that with how I'm hashing things...
|
Quote:
If on a much better server your script can go up to comparing 10mil pics/hour, that means you can run 10 comparisons/hour against a database of one million pics (pretty average db, many programs have that much and often more). That means ~250 comparisons/day - practically, that means you can check about 1 thread/day at pornbb or similar major boards. Or, in terms of posts, that's about 100 posts/ day (an average post has more than one preview attached). To give you some idea of the size of the task, major boards like pornbb or saff boast 5-10K posts/day. Granted, not all of them contain any pictures to analyze (most of them are just "thanks"), but it is safe to assume that no less than 1K posts/day will contain some graphics to compare with the database. That requires 10x more computing power than we have according to our best estimation. And you need to control at least a dozen of the major boards to make sure your stuff is not easy to find, hence 100x faster script is necessary. Which could do about 1 billion comparisons/hour. I'm talking boards only because that's where most of the picture piracy takes place. Tubes and torrents mostly steal videos (although photo content is not uncommon for torrents too, boards host much more of it). |
You can optimize your search of course, and only analyze threads/posts that contain some specific keywords. You can also narrow your database and compare with either your most recent or most pirated pictures (sets).
But that raises new issues/questions: 1. That requires much more sophisticated spidering software that filters what you need. 2. Keyword filters are bound to be innacurate, especially for mainstream/hardcore/babe sites. 3. People will still be able to post your stuff that's not in the db. All in all, simply brute forcing all new posts and comparing then against your entire database, is the REAL solution; while everything else is bound to be less accurate and miss a lot of your stuff posted illegally. Thus, please find us some way to make 1 billion comparisons/hour :) While search optimization (for those who still want to go that route) will be better spent on adding more boards to the pool. |
Quote:
Reason is that many users have mediocre / saturated (ie. other computers sharing / torrents / malware / bots, etc) slow internet connections, which will adversely affect streaming, while many pirates have good connections and fast computers optimized for video capturing. On a related note regarding video quality ... many surfers plain do not care as long as it's good enough. Heck, even the worst quality capture will likely still be far clearer and better than the average poor quality VHS pirated copies that people used to widely trade. In regards to embedded user member info - what good does that really do? Presumably many members have little assets to go after. And more to the point, as was highlighted in another thread, much of the pirating is being done by others within the industry. Anyways, member info in files is little more than a roadbump to the pros ... throwaway credit cards, anonymized IP addresses, VPNs, botnets, etc make it easy to get around all that - leaving a dead-end trail / pinning the blame on unsuspecting users. Ron |
Quote:
Well, I can't test this accurately as I don't have a large enough dataset. I only have 55 pics... I did ask for a dataset - if you can email me a link where I can download a zipped dataset, I can give it a try. Testing something that is taking 0.1 sec is not accurate enough to project up to large datasets. Plus, the app can load in > 1 image at a time so multi-instance comparisons can be achieved. I just need a decent enough dataset to be able to test - I will trash the dataset when finished testing. |
Quote:
The larger numbers just come from putting a bit more thought into real application of the script. What size of image are you comparing against? as I would have thought if it can reliably recognise thumbs it would whizz through them compared to an 800x600 image or bigger and a lot of board images are a thumb preview. |
Once the hash db has been created, comparing thumbs against it would be faster, yes. And quality of matching wouldn't decrease, just because it's a thumb. Best though to create the original hash db against full size, to get the best quality plot of the pixels involved.
If someone finds a forum or something with their content on it, drop me an email giving me permission to download the content for testing and I'll use that as a base - the more images the better! btw, script tested and runs 8x faster on a quad core as opposed to a dual core. |
Quote:
Quote:
|
Quote:
tagging the content is an interesting concept in order to sue the people that distribute it, that's for sure. Your ideas on first page saying only stream and protect the stream, however, are not good. The income lost because of unhappy members is far bigger than "fixing" piracy will ever return. |
borked,
sorry for my ignorance.. but how exactly does hashing images help you find stolen videos? Or am I missing the point of all of this? What images are you hashing and comparing to? Places like PornBB will never have an image on a post which comes cleanly from the members area. 99% of the time it will be a video thumbnail compilation image. |
The hashing images is for photosets.
|
Quote:
However, there are quite a number of sites that I know of that only allow streams - don't discount an idea since it isn't what "the big sites" use... Quote:
It's detection by comparison, not by fingerprinting. Maybe not as water tight, but certainly useful enough in an automated setting to flag certain content for verification by a human eye. Makes finding a needle in a haystack easier when you have a giant magnet to help you look :upsidedow |
Quote:
ah! in all honesty, there's no way this app is going to be able to do 278,000 analyses per second... --edit, if you read the post above using the hashed db, you'll see where the problem lies - it takes ~0.1 sec to generate a pixel hash of an image - the comparison of that hash against a hashdb is lightening fast. I will see if I can reduce this time to generate the hash, but in the order of 10^5 faster - no ways... Quote:
|
Quote:
|
Quote:
Quote:
|
Quote:
|
Quote:
don't discount something just because the big sites use it. If downloads were so important, tubes wouldn't exist. Anyway, this isn't the thread for discussing this kind of stuff - it's about putting the options and possibilities on the table so each producer/site owner can make their own decisions - I'm all for giving/having the choice. |
Quote:
looking forward to testing them out :thumbsup |
OK, on a database hash of 10,315 images, I took a dataset of 281 images and asked to find the matches - it took 38.52 seconds. The vast majority of that time (36.53 secs) was taken up by creating the hashes of the 281 images (which was expected)
That's 2,898,515 hash-against-hash calculations in 2.01 seconds (5.19 billion calculations per hour). |
borked, very nice...
now, if we could only setup our own industry wide video fingerprinting so we do not have to pay some company to run it, that would be lovely... q is, can we without using some patent that possibly exists? Your current hashes, they change if the image is resized (preserving aspect) or lowered in quality, right? So how can we build hashes which are still accurate enough but do not care about resizing or quality loss? Any thoughts on that? I am wondering if changing resolution of an image to a very low number, like 50x50 or so, if the colors would get close enough together regardless of how the image is cut or changed in quality? IE, take a square part of the inside of an image of around 1000x1000, re-size it to 50x50 using a standard re-size technique which interpolates the colors. Then use this on two versions of the same image, jpeg at 100 and jpeg at 50% quality.. and see what happens to the outcome, compare it visually... |
better still...
I created 120x90 thumbs of all the 281 query images (which would also distort them all as it doesn't preserve aspect ratio) and used the thumbs to query the db... it took 3.69 seconds :upsidedow |
Quote:
See my point above for distorting images - I used the imagemagick command mogrify -format jpg -define jpeg:size=240x180 -thumbnail 120x90 '*.jpg' to make thumbs of all the images in the query set and all came back with the original image found in the db. 2 false negatives slipped through. This is only for images, not videos and no patents were violated in its creation. |
an eg of original image that had been pre-hashed and a thumb that found its original image when queried
http://borkedcoder.com/image_compari...g004_007_o.jpg http://borkedcoder.com/image_compari...g004_007_t.jpg You can see that the query image has lost a lot of information, but still finds its original. So much faster and about as efficient to do this on thumbs, so long as the reference db was created on full-size images. |
Nice read borked
This is what I implemented over 2 years ago on our videos. And you are 100% on the money. As for the folks convinced they will lose members or it will be a "miserable experience": Claudia-Marie's site is kicking ass on both new sales and rebills. The high quality stream can not be downloaded, and since it's h264 compression I'm able to keep the bit rate down to 1.2 to 1.5 Mbps which doesn't put a lot of strain on the customers computer. I've actually done this for a little more than 2 years. And since I do all the support for the site AND we have a "wall" on the main page of the members area for members to write anything they like...I can tell you that it's a non-issue with our members. The one thing I did to completely stop any complaints was I actually DO give them a downloadable version. But the DL version is only 480 x 272 and is a much lower bit rate. Just enough for an honest member to be able to watch it when he's not online...but just crappy enough to not be the kind of quality that most would want to upload to a pirate site. Every once in a while they do...but removeyourcontent comes along and gets those down in a timely manner. And that simple thing eliminated all complaints. Surfing her members area myself...I would never even bother with the download. The player in the members area is so much better. Big high def stream that starts instantly, and I can click anywhere in the timeline and start streaming from that point instantly. So if I don't want to watch the blowjob, I can go straight to the anal. It's truly a MUCH better experience than my original .wmv set up. Looks better, works better, and gives them an experience that they are used to from YouTube (only mine streams better lol ) And yeah...the high res stream displays their username and IP address for a couple of seconds at random times in random places on the vid. Nobody even really notices it because it's small and unobtrusive. But IF they ever screen record one of my high res streams...I'm going to know EXACTLY who did it...right down to their real name and address straight out of NATS data base. And then? They would be getting a letter from our lawyer and paying a settlement. I can report to all of you that "yes" it does indeed work. And yes it has made money hand over fist for me since I first implemented it. Of course we offer a lot in our members area which is important too. Lots of interaction, a "community" with profiles, etc. and Claudia-Marie does her webcam show every week for them as well. I personally communicate with our members in the members area (unlike most of the "big" paysite owners) and I know exactly what the concerns of our members are. And it ain't whether or not they can download the streams. Our members are more interested in putting in requests for what they want to see CM do in her next scene. It's a remarkable concept...the members are actually interested in the PORN and not the delivery. That's what happens when you give an innovative and high quality delivery system and combine it with a product that they want. |
Quote:
Quote:
If my calculations are correct, at that speed you can compare ~ 6500 images/day with a database of 1 million pics (hashes), and if using thumbnails only that goes up to ~ 65 000 images/day - more than enough to keep ALL boards and torrents under control image piracy wise. |
If I ran my own pay site, I would be upset if people stole content, however I'd be even more pissed at the people MASS DISTRIBUTING it. Targeting torrent sites / tube sites with law or some kind of brute force would kill the mother feeding the children.
|
borked, nice.. what happends if you lower the jpg quality of the thumbnail to 50 or even 25? does it still find the right match?
What if you crop the thumbnail? What if you do both crop and lower quality? |
Quote:
With such an accuracy, and with some additional programming we can even detect multi-thumbnail previews that are popular at surfer boards too. |
Quote:
I've taken a single master image and made the following "attacks" lks_g004_007.jpg - original image lks_g004_007_o.jpg - another orig image lks_g004_007_t.jpg - 90x120 thumb of orig lks_g004_007_t_10.jpg - 90x120 thumb with jpeg quality of 10 lks_g004_007_c_60.jpg - crop of orig image at jpeg 60 lks_g004_007_c_t.jpg - crop of c_60 lks_g004_007_c_t_10.jpg as c_t but jpeg 10 all passed bar lks_g004_007_c_t_10.jpg. c_t brought up 2 false negatives. all images in the test query set can be found here output is: Code:
./hashdbQ images/test-queryattacked hashes.mvp |
Quote:
|
I need to merge the code that compares 1 image against 1 image with the newer code of comparing against a hashdb, since if you load the orig image and cropped thumb 10% jpeg, it gets matched at 82% confidence, whereas the hash scanning misses it.
|
Automated adult video copyright protection
Just something you may want to add to your educational series entry.
CopyMotion - adult video copyright protection service. We have recently launched a service that requires no watermarking or alterations of any kind to find and remove your video content from tube sites. Our proprietary technology is computer vision based, just like the FSC Anti Piracy / Vobile program. The tube videos can be severely altered and they will still be found. This is explained in detail in the Video Matching Algorithm section of our website. What's really unique is that we bring this to the table at a much lower price point than our competition. We believe for this technology to have a real impact in helping the adult industry it must be affordable to all content producers, not just the mega studios. Cheers. |
This is all fascinating stuff, but one question, are pirates really hiding at all? I get the impression threads posting pirated stuff just use the name of the video on the title or post... there's no need to detect the image automatically when they're just calling it "Torrent of Video_name"...
|
Quote:
Then it evolved into detection. One of the main reasons for automated detection would be to save you the hassle of finding it to help in the sending out of take down notices... |
Sure there must be a use for dating sites to auto-nuke all the scamming Raven Riley etc profiles.
|
All times are GMT -7. The time now is 10:38 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123