The intricacies of strstr
the deceptively simply sounding and oft-used function strstr has more than what meets the eye.
How many times is that you have searched a web-page/word document/filesystem for mactching strings? Probably more than once a day. The function that does string matching for you is strstr
This has the signature
char* strstr(const char* text , const char* pattern_tomatch)
It returns   a pointer  to the  located string, else a null pointer.
Lets go about implementing this function.
The first and most trivial implementation has a O(n2)  complexity
Heres the code for this
char* strstr(const char* s1 , const char* s2){
         // set tmp pointers to intial location
        char* tmp =  s2;
        char* tmp2 = s1;
         // null and empty checks  
        if (0 == s2) return  s1;
        // returns s1 if s2 is ""
        if (0 == *s2) return s1;
        if (0 == s1) return  0;
        if (0 == *s1) return 0;
         while(*s1){
                if (*s1 == *s2){
                        if (0 == *(++s2)){
                                return (s1-(s2-tmp-1));
                        }
                 }
                else {
                        s2 = tmp;
                        s1 = ++tmp2;
                        if (*s1 == *s2) s2++;
                }
                s1++;
         }
        return 0;
}
In order to simplify the above "obfuscated" code and to prove its O(n2), lets rewrite using a for loop
char* strstr(const char* s1, const char* s2){
    //skipping validations
    // bad practise, size of s1 and/or s2 can be > long
    unsigned long len_s1 = strlen(s1);
    unsigned long len_s2 = strlen(s2);
     unsigned long j = 0;
    for (unsigned long i=0;i
            if (s1[i] == s2[j]){
                    for (j=1;j
                            break;
                    }
            }
         if (j == len_s2+1) return s1+i+(j-len_s2);
    }
    return (char*)0;
}
I always prefer not using ints/longs as they are always susceptible to overflows and lots of  of (consequent) security bugs
I will now go about writing developing an algortihm for speeding up the whole process.
The most common thought that comes to the mind is to somehow use a hash table (hash table is logical to me because the problem involves a lookup of individual charaters)
Hash tables arent  suitable for this problem because hashes dont store ordering information.
For eg,
aaxbaa , ab  are to strstr'ed
a hash table would like
key | # occurences
a          4
b          1
x          1
the string to be matched "ab", will match the input str because both a and b exist in the hash table.
Obviously, this approach is no good. There should be a means to store the place of occurence of a character and not the number of occurences. So a hash table can look like this
key    | linked list of positions
a             0->1->4->5
b               3
x               2
Now the algorithm can be modified to do look for s2 ("ab" in our case) such that the lookups are in increasing, consecutive order.
so, a look of "a" will yield 0 and b will yield 2. o and 2 are in consecutive, so its not a match.
Suppose, we look fo "xb",  lookup for x yields  2 and lookup for b yields 3.
So (2,3)  is an increasing, consecutive order and we have a match.
Now, what if s2 had many occurences of a character. In order to scale,  the lookup should be engineered to take this into account.....
to be contd(with other possible heuristics) ......

