GoFuckYourself.com - Adult Webmaster Forum - Tech Is Anyone Using The Full Data Dump From HubTrafiic?..

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - Tech Is Anyone Using The Full Data Dump From HubTrafiic?.. (https://gfy.com/showthread.php?t=1307568)

EddyTheDog

01-05-2019 10:53 AM

Is Anyone Using The Full Data Dump From HubTrafiic?..

What tools are you using? - It's massive and I am struggling...

Thanks.....

Klen	01-05-2019 12:17 PM

You should be careful with word choice otherwise you can attract currently sober into thread :D
And yes i use full dump as well, i made my own script to parse it plus script to parse weekly updates.

CjTheFish

01-05-2019 12:27 PM

Good luck, I have never been able to successfully full from those dumps. Hope someone can shed more light on it as well.

EddyTheDog

01-05-2019 12:59 PM

Quote:

Originally Posted by KlenTelaris (Post 22391156)

You should be careful with word choice otherwise you can attract currently sober into thread :D
And yes i use full dump as well, i made my own script to parse it plus script to parse weekly updates.

lol - True...

wankawonk

01-05-2019 02:12 PM

yes I have the full pornhub/redtube/tube8 dumps on my sites

like klentelaris I use my own script

you can fit the whole thing on a single elasticsearch node but you need a decently beefy machine

it's definitely too big for most of the cookie cutter scripts people use, smart-cj, wp-tube, tube-ace

I bet mechbunny could handle it though, with a big enough server

EddyTheDog

01-05-2019 02:37 PM

Quote:

Originally Posted by wankawonk (Post 22391218)

I haven't looked at ElasticSearch yet - Is easy enough to import it?..

Klen	01-05-2019 04:11 PM

Quote:

Originally Posted by wankawonk (Post 22391218)

I do know those script are extremely resource hungry for some reason, tho when i was testing with my script it worked fast on a 512 mb vps.

babeterminal

01-05-2019 04:23 PM

https://www.wpxtube.com/

babeterminal

01-05-2019 04:25 PM

those script are extremely resource hungry they all are we need some way to break it down above cuts off at set amounts then carries on very slow

brassmonkey

01-05-2019 04:37 PM

Quote:

Originally Posted by babeterminal (Post 22391301)

https://www.wpxtube.com/

:1orglaugh:1orglaugh:1orglaugh:1orglaugh nope!

wankawonk

01-05-2019 04:40 PM

Quote:

Originally Posted by babeterminal (Post 22391301)

https://www.wpxtube.com/

LOL I checked out some of their demo tubes. Barely functional, slow as shit, some of them are already shut down

good fuckin luck with that

babeterminal

01-05-2019 04:44 PM

Quote:

Originally Posted by wankawonk (Post 22391317)

LOL I checked out some of their demo tubes. Barely functional, slow as shit, some of them are already shut down

good fuckin luck with that

telll me about it, mines stuck on php5 lol if i upgrade anything i think it will break

next.......... lol

babeterminal

01-05-2019 04:46 PM

i think i will have to start from scratch and go with the guy who is jetting around the world and airport staff are smashing his equipment what is his tube script?

AdultKing

01-06-2019 12:59 AM

Quote:

Originally Posted by EddyTheDog (Post 22391118)

What tools are you using? - It's massive and I am struggling...

Thanks.....

One way to deal with it is to import it to a No SQL database and use something like elastic and an api to feed a front end site and give search functionality. This is all free, you just need to know how to put the pieces together.

The advantage of doing it this way is you can run just one box to feed an unlimited number of websites with data through the API.

EddyTheDog

01-06-2019 02:07 AM

Quote:

Originally Posted by AdultKing (Post 22391500)

That's pretty much what I have been doing most of the day - There has been a lot of head scratching and swearing but I think I am getting there...

Thanks.....

AdultKing

01-06-2019 02:18 AM

Quote:

Originally Posted by EddyTheDog (Post 22391506)

That's pretty much what I have been doing most of the day - There has been a lot of head scratching and swearing but I think I am getting there...

Thanks.....

Just a tip - SSD drives make a big difference in performance.

Klen	01-06-2019 03:04 AM

Quote:

Originally Posted by AdultKing (Post 22391500)

You are so comical lately . "only way to deal with it" :1orglaugh
And speaking about databases, i tested mariadb while i was doing import of this for first time, but for some reason , while mariadb loads content faster then mysql, it is terribly slow when it comes to inserting-it took 12 hours to import hubtraffic dump, while with mysql it took only 3 hours. So i figured out how to optimize mysql to load content fast and there was no need anymore for mariadb.

babeterminal

01-06-2019 07:03 AM

one thing this all reminds me of when tube scripts and xml/atom/css like some of you i was trying different ways to get this to work,

that i fucked up score feeds somehow and started showing hundreds of thousands of hits in the program admin, he was asking what the hells going on, i will see if i can take some screenshots, he didnt close account as i generally had no idea what was going on, still dont

AdultKing

01-06-2019 07:10 AM

Quote:

Originally Posted by KlenTelaris (Post 22391525)

You need to learn to read, I said "ONE WAY TO DEAL WITH IT" :helpme

babeterminal

01-06-2019 07:27 AM

no joy can only go backstats 2009, this was 06/08

Klen	01-06-2019 09:05 AM

Quote:

Originally Posted by AdultKing (Post 22391611)

You need to learn to read, I said "ONE WAY TO DEAL WITH IT" :helpme

Damn brain auto-correct :1orglaugh

CurrentlySober

01-06-2019 09:54 AM

https://i.imgur.com/rItRjGt.jpg

:thumbsup:thumbsup:thumbsup

brassmonkey

01-06-2019 10:46 AM

maybe find a tool that can split it to smaller files

shake

01-06-2019 11:55 AM

I wrote a python script to deal with it, keeps everything in memory (I have an old server with tons of ram) and it finishes in a few minutes.

babeterminal

01-06-2019 04:56 PM

Quote:

Originally Posted by shake (Post 22391748)

I wrote a python script to deal with it, keeps everything in memory (I have an old server with tons of ram) and it finishes in a few minutes.

what did you name it?

brassmonkey

01-06-2019 06:39 PM

Quote:

Originally Posted by babeterminal (Post 22391867)

what did you name it?

white snake???? :1orglaugh:1orglaugh:1orglaugh:1orglaugh:1orglaugh :1orglaugh

brassmonkey

01-06-2019 07:00 PM

split axe may do it. i have a copy if u need it

freecartoonporn

01-06-2019 07:31 PM

for huge files i have used "load data infile" works best if you dont want any manipulation done on data.

EddyTheDog

01-06-2019 07:43 PM

Quote:

Originally Posted by brassmonkey (Post 22391908)

split axe may do it. i have a copy if u need it

I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

brassmonkey

01-06-2019 07:46 PM

Quote:

Originally Posted by EddyTheDog (Post 22391920)

I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

:thumbsup:thumbsup glad you found one! cheers!

AdultKing

01-06-2019 09:20 PM

Quote:

Originally Posted by EddyTheDog (Post 22391920)

I just found/used a prog from filesplitter.org - It's Freeware and worked like a charm - Not bad, it's a 9GB file and the prog looks like it was made for Windows 3.11...

On any unix/linux/mac system you can use the command split to split files.

example splits a file of any size into smaller files, each with 5000 lines

Code:

split -l 5000 anyfile.ext newfile

If split isn't on your system:

on Ubuntu/Debian type

Code:

sudo apt-get install split

on Centos/RHEL

Code:

sudo yum install split

on MacOS

Code:

brew install split

EddyTheDog

01-06-2019 09:33 PM

Ooops - I split into 500mb chunks and I am doing a find and replace | to ^t - I probably should gone a bit smaller - lol...

AdultKing

01-07-2019 03:06 PM

Quote:

Originally Posted by EddyTheDog (Post 22391941)

Ooops - I split into 500mb chunks and I am doing a find and replace | to ^t - I probably should gone a bit smaller - lol...

How did you go, did you get there?

babeterminal

01-07-2019 03:36 PM

how did that go for you 9gb to a 500mb then what? did it work?

HowlingWulf

01-07-2019 05:27 PM

No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

AdultKing

01-07-2019 05:28 PM

Quote:

Originally Posted by HowlingWulf (Post 22392590)

No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

The API is good, but if you're building out a massive embed site the dump is better.

EddyTheDog

01-07-2019 05:37 PM

Quote:

Originally Posted by HowlingWulf (Post 22392590)

No I just use their API to pull in the good videos I want and bypass all the junk shitty vids.

Too easy. Although I am using the API for some stuff like tags etc...

Too be honest it's a chance to learn about 'big data' and related tech. I can then use that knowledge for something that might actually make me some money!....

AdultKing

01-07-2019 05:44 PM

Quote:

Originally Posted by EddyTheDog (Post 22392596)

Good idea, learning stuff like distributed database, elastic etc is all useful.

If you're keen to learn the most popular PHP framework around, which includes support for technologies related to big data, such as Algolia then learn Laravel.

https://laravel.com

https://laracasts.com

EddyTheDog

01-07-2019 06:03 PM

Quote:

Originally Posted by AdultKing (Post 22392600)

I have been thinking about Laravel - I looked at it years ago - Maybe it's time to look again...

EddyTheDog

01-08-2019 01:01 PM

OK - So this is the flow I used:

Download/Unzip CSV dump -> Use FileSplitter to split CSV into 500mb files -> Used NotePad++ to find and replace pipes with tabs (makes DB import easier) -> Used Navicat to import into MongoDB...

Now I just have to learn how to use the data lol.....

Notes: This was all done on a local Windows machine - Navicat will import them into an external MongoDB instance if needed - It's a lot of data and you need to be patient - Either that or start a large Google Cloud instance and do it on there, I wish I had done that lol...

freecartoonporn

01-08-2019 06:47 PM

people using API ,

API Limit is about 40 requests per 10 second

so its not feasible to use api for search queries/tag queries on live site, you need to implement your own search function.

AdultKing

01-08-2019 10:18 PM

Quote:

Originally Posted by freecartoonporn (Post 22393510)

people using API ,

API Limit is about 40 requests per 10 second

so its not feasible to use api for search queries/tag queries on live site, you need to implement your own search function.

it's fine if you cache requests.

All times are GMT -7. The time now is 07:39 AM.