Wednesday, September 08, 2010 03:40:50Login · Register
 

    Challenge Activity
02:59:54 - lacanian
     - Completed basic [13]
02:47:44 - chuiy
     - Completed basic [8]
10:12:24 - blabla
     - Completed basic [2]
10:04:32 - blabla
     - Completed basic [1]
09:22:30 - satishek
     - Completed privilege [4]
09:13:29 - mandrake
     - Completed crack [2]
04:18:55 - Iceheart456
     - Completed decrypt [3]
04:18:28 - InferiorHell
     - Completed decrypt [3]
04:18:12 - Iceheart456
     - Completed decrypt [2]
04:17:02 - Iceheart456
     - Completed decrypt [1]
04:16:34 - InferiorHell
     - Completed decrypt [2]
03:54:07 - InferiorHell
     - Completed decrypt [1]
03:42:18 - InferiorHell
     - Completed basic [4]
03:06:00 - mandrake
     - Completed decrypt [7]
02:54:04 - lacanian
     - Completed basic [12]
02:31:49 - am107cs019
     - Completed decrypt [7]
 

    Scoreboard Top 20
UserPoints
Abhineet4795   
auditorsec4795   
ne0114795   
Null Set4795   
blandyuk4780   
bluechill4750   
Teddy4730   
TurboBorland4475   
Qwexotic4460   
tiiger11114205   
preet4180   
LiquidFusi0n4175   
OnlyHuman4125   
samthg4110   
satishek3900   
pilchdragon3660   
Override3655   
chronic123640   
dash803590   
Torrment3515   
 

    Login
Username

Password



Not a member yet?
Click here to register.

Forgotten your password?
Request a new one here.
 

    Users Online
· Guests Online: 7

· Members Online: 0

· Members on IRC: 7
TurboBorland, thebigbucks, sirEgghead, Satan911, NoX, louve, LK

· Bots Online: 1
GoogleBot

· Total Members: 1,476
· Newest Member: blabla
 

 

 

 

    Top 10 Forum Posters
UserPosts
bluechill915   
Qwexotic692   
cruizrisner476   
Null Set350   
TurboBorland331   
Stormc1nd3r308   
auditorsec299   
madf0x296   
Override238   
jakecrepinsek235   
 

    Affiliates
 

Advanced KMP String Search
     
Sooner or later you're going to have to deal with pattern matching at a lower level than you're typically accustomed. For instance, you may need to implement a search feature within a text editor. What follows is an example of the KMP (Knuth–Morris–Pratt) algorithm with a nice little hacker twist to give it slightly better performance.

For those who are unfamiliar, the basics of this algorithm are simple. Rather than perform a search against every character of the input ( i.e. strcmp(); ), we instead perform a match against only key positions of the string. This allows us to easily eliminate positions from the search space simply because the key positions do not match. Here's an example

the universe: ABAABACDEGAAA
the key:........EGA

On the first pass, we compare ABA to EGA using only key position 0. A is not equal to E so we can avoid that position and move to the next, which saves the time of doing a string comparison on the remaining pieces. So the next pass becomes

the universe: ABAABACDEGAAA
the key:..........EGA

B is not equal to E, so skip that position as well. Only when we've found a matching key position, do we start to evaluate the remaining characters. It may not seem like it, but this is actually a tremendous time savings over doing a naive strcmp() on each key position. However, a problem arises in the speed of the algorithm, when we have a universe that has many potential matches. For example:

the universe: EGXEECAEGADEGA
the key:........EGA

In this scenario, we spend a bit of extra time looking at sections that have similar key positions, ultimately to find out, that they never result in a perfect match. Using the KMP algorithm, this can't really be avoided. It's just how the algorithm works. But, it can be improved. Time for that hacker twist I talked about.

What happens if we examine two positions at once? We can effectively cut our comparison time in half by narrowing the search space during each pass from both ends. If we get a match with the first key position, and the second key position, we know that there may be a need to keep processing, and if not, we've found a entire section of the universe which can safely be ignored, as opposed to only a single key position.

For our purposes, I've chosen the initial key positions to be search_text[0] for the left key position, and strlen( search_text ) for the right. During each pass, non-matching sections of the universe are ignored, and then the left side is incremented, and the right side decremented. Saving even more time, since we no longer have to check the first and last positions. Hopefully you're still with me. If not, it's just the KMP algorithm working from both ends of the string. Simple eh?

So how to do this in C/C++? The toughest question we'll have to answer is: "How do we ignore an entire section of the universe?"

Well, how about using one of the standard containers? How about a stack? If we treat our search space as a stack, we can simply ignore non-matching patterns by popping them during each pass. The next pass will then automatically begin with only the sections of the search space that have the potential to form a match. The code below is that very scenario in a nutshell. It's only a demo of the algorithm, but with only slight modification you can use it for any of your string searching purposes. Enjoy.

Oh, and for more on string search algorithms be sure to check out the wiki.

Code
#include
#include

int main()
{
   std::vector search_space_stack;
   std::vector match_stack;

   char universe[] = \"This is a very, very, very, hairy algorithmvery\";
   char search_text[] = \"very\";

   int len_pattern = std::strlen(search_text);
   int previous_first_position = 0;

   // populate the universe (a.k.a search space) for the initial pass
   for(int i = 0; i < strlen(universe) - (len_pattern - 1); i++)
   {
      search_space_stack.push_back(i);
   }

   // the heart and soul of the search
   while( search_space_stack.empty() == false )
   {
      int start_position = previous_first_position;
      int end_position   = (len_pattern -1) - start_position;

      char first_char = search_text[start_position];
      char last_char = search_text[end_position];

      char test_char1 = universe[ search_space_stack.back() + start_position ];
      char test_char2 = universe[ search_space_stack.back() + end_position ];

      int result = (test_char1 == first_char) + (test_char2 == last_char);

      if( result != 2 )
      {
         search_space_stack.pop_back();
         previous_first_position += end_position;
      }
      else if( (end_position - start_position) < 1 )
      {
         match_stack.push_back( search_space_stack.back() );
         search_space_stack.pop_back();
      }
      else
      {
         previous_first_position++;
      }

   }

   // now to display our results
   for(int i = 0; i < match_stack.size(); i++)
   {
      char buffer[256];
      std::memset((void*)buffer,0,256);
      std::memcpy((void*)buffer,(void*)&universe[match_stack[i]],len_pattern);
      std::cout << match_stack[i] << \" \" << buffer << std::endl;
   }

   // clean up before exit
   match_stack.clear();

   return(0);
}



 
Comments
 
#1 | Qwexotic on 05/25/2010 21:13
Quite nice, though it gets a lot more "explainy-ish" so I turned it into an article. Wink
#2 | OnlyHuman on 05/26/2010 17:37
I'm just glad it's here. Thank you. Now that it's up though, I can see two mistakes in it...

edit:

And... they've been fixed! Thanks Qwexotic. I'm sure it was hardly noticeable, but I knew it was there.
#3 | Qwexotic on 05/26/2010 21:37
Fixed the code, as you requested in the pm. Wink
 
 
Post Comment
 
Please Login to Post a Comment.
 
 
Ratings
 
Rating is available to Members only.

Please login or register to vote.

Awesome! Awesome! 100% [1 Vote]
Very Good Very Good 0% [No Votes]
Good Good 0% [No Votes]
Average Average 0% [No Votes]
Poor Poor 0% [No Votes]