Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 10-11-2011, 01:24 PM   #1
camperjohn64
Confirmed User
 
Industry Role:
Join Date: Feb 2005
Location: Los Angeles
Posts: 1,531
regular expression help

I want to clean a database of words that have bad letters in them. I know how to remove bad characters, but how can I remove the word along with it?

"The qu&&ick brown fox ju&&&mped over the lazy dog."

assuming & is a bad character, how can I end up with

"The brown fox over the lazy dog."

Basically, anything word that doesn't have an alpha-numeric or [.,<>?~@#$%^&*()] I want to remove the word.

Single preg_replace expression??
__________________
www.gimmiegirlproductions.com
camperjohn64 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 01:33 PM   #2
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,359
is it wordpress? or just another site db, if it was wordpress, you could use the search and replace plugin.
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 01:39 PM   #3
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,359
bascially you want to execute

Code:
update [table_name] set [field_name] = replace([field_name],'[string_to_find]','[string_to_replace]');
or here is a tool

http://sewmyheadon.com/2009/mysql-search-replace-tool/
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 01:39 PM   #4
camperjohn64
Confirmed User
 
Industry Role:
Join Date: Feb 2005
Location: Los Angeles
Posts: 1,531
Quote:
Originally Posted by fris View Post
is it wordpress? or just another site db, if it was wordpress, you could use the search and replace plugin.
The problem isnt removing bad characters, the problem is deleting words that contain bad characters.
__________________
www.gimmiegirlproductions.com
camperjohn64 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 02:05 PM   #5
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
Quote:
I want to clean a database of words that have bad letters in them. I know how to remove bad characters, but how can I remove the word along with it?

"The qu&&ick brown fox ju&&&mped over the lazy dog."

assuming & is a bad character, how can I end up with

"The brown fox over the lazy dog."

Basically, anything word that doesn't have an alpha-numeric or [.,<>?~@#$%^&*()] I want to remove the word.
You've asked for two very different things. Removing words that DO have bad characters is different from removing words than do NOT have "good" characters. What if it has both?

To remove words that have the "bad" character:

\w is the class of word characters. You're looking for a string containing at least one "bad character" and optionally some word characters.
"Words", as you define them, are strings of word characters and &, which is represented as \w|& .
So your assuming & is the bad character, the regular expression is:

(\w|&)*&(\w|&)*

preg_replace('/(\w|&)*&(\w|&)*/', "", $subject);


Quote:
Basically, anything word that doesn't have an alpha-numeric or [.,<>?~@#$%^&*()] I want to remove the word.
Removing them based on what they do NOT have is a different thing than removing things based on what they DO have as above. In this case, you're looking for strings of [^.,<>?~@#$%^&*()], bracketed by space characters I suppose since you have .,? and other non-word characters part of your class.
So you're looking for:
\s[^.,<>?~@#$%^&*()]+\s

and replacing it with a single space delimiter like this:

preg_replace('/\s[^.,<>?~@#$%^&*()]+\s/', " ", $subject);
__________________
For historical display only. This information is not current:
support&#64;bettercgi.com ICQ 7208627
Strongbox - The next generation in site security
Throttlebox - The next generation in bandwidth control
Clonebox - Backup and disaster recovery on steroids
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 02:13 PM   #6
Bladewire
StraightBro
 
Bladewire's Avatar
 
Industry Role:
Join Date: Aug 2003
Location: Monarch Beach, CA USA
Posts: 56,229
Quote:
Originally Posted by raymor View Post
You've asked for two very different things. Removing words that DO have bad characters is different from removing words than do NOT have "good" characters. What if it has both?

To remove words that have the "bad" character:

\w is the class of word characters. You're looking for a string containing at least one "bad character" and optionally some word characters.
"Words", as you define them, are strings of word characters and &, which is represented as \w|& .
So your assuming & is the bad character, the regular expression is:

(\w|&)*&(\w|&)*

preg_replace('/(\w|&)*&(\w|&)*/', "", $subject);




Removing them based on what they do NOT have is a different thing than removing things based on what they DO have as above. In this case, you're looking for strings of [^.,<>?~@#$%^&*()], bracketed by space characters I suppose since you have .,? and other non-word characters part of your class.
So you're looking for:
\s[^.,<>?~@#$%^&*()]+\s

and replacing it with a single space delimiter like this:

preg_replace('/\s[^.,<>?~@#$%^&*()]+\s/', " ", $subject);

DAMN that was quick, well done!
__________________


Skype: CallTomNow

Bladewire is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 02:26 PM   #7
camperjohn64
Confirmed User
 
Industry Role:
Join Date: Feb 2005
Location: Los Angeles
Posts: 1,531
Yes, thanks - you are correct in my typo. And thanks for the answer - coding now :-)
__________________
www.gimmiegirlproductions.com
camperjohn64 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 03:11 PM   #8
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
Quote:
Originally Posted by Squirtit View Post
DAMN that was quick, well done!



This stuff was hard in 1997 when we were trying to get referer based .htaccess right.
We had to watch out for things like goodguy.com.hacker.com

I've had a bit of practice since then.

Camperjohn, what I posted is only known to be correct, not tested.
__________________
For historical display only. This information is not current:
support&#64;bettercgi.com ICQ 7208627
Strongbox - The next generation in site security
Throttlebox - The next generation in bandwidth control
Clonebox - Backup and disaster recovery on steroids
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 03:41 PM   #9
woj
<&(©¿©)&>
 
woj's Avatar
 
Industry Role:
Join Date: Jul 2002
Location: Chicago
Posts: 47,882
I would just write a quick script to do that, fetch text from db, split into words, check each word, unsplit, save it...

a bit slower and less efficient, but pretty hard to fuck up... on the other hand with one regexp command, one wrong character and your whole db could get fucked up...
__________________
Custom Software Development, email: woj#at#wojfun#.#com to discuss details or skype: wojl2000 or gchat: wojfun or telegram: wojl2000
Affiliate program tools: Hosted Galleries Manager Banner Manager Video Manager
Wordpress Affiliate Plugin Pic/Movie of the Day Fansign Generator Zip Manager
woj is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 05:23 PM   #10
camperjohn64
Confirmed User
 
Industry Role:
Join Date: Feb 2005
Location: Los Angeles
Posts: 1,531
Quote:
Originally Posted by raymor View Post


This stuff was hard in 1997 when we were trying to get referer based .htaccess right.
We had to watch out for things like goodguy.com.hacker.com

I've had a bit of practice since then.

Camperjohn, what I posted is only known to be correct, not tested.
I tested first of course
__________________
www.gimmiegirlproductions.com
camperjohn64 is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 10-11-2011, 06:08 PM   #11
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
Quote:
Originally Posted by woj View Post
I would just write a quick script to do that, fetch text from db, split into words, check each word, unsplit, save it...

a bit slower and less efficient, but pretty hard to fuck up... on the other hand with one regexp command, one wrong character and your whole db could get fucked up...

yes definitely either way one would first to a database dump or CREATE TABLE backup SELECT * FROM thetable.

Messing up is okay, it happens. Breaking things is not. You'd want to backup either way because for example even of you her the word delete perfect, join is not the inverse of split, so data could be lost by splitting and joining.

Last edited by raymor; 10-11-2011 at 06:14 PM..
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.