Oh no, we're being spammed!With this post I am starting a series of articles dealing about topics related to spam, as seen from a Grey Hat SEO’s point of view.

Let this be quite clear: I am not going to provide you with codes to develop spam engines, but I am going to use what I’ve learnt through these engines in order to provide you with theories and/or information that can be implemented both in white hat and black hat sites.

It’s up to you how to make best use of this knowledge and it’s also up to you to prove me wrong whenever you want to.

Enough of digressions, let’s start…

(please read slowly or you’ll have to learn about Markov Chains on Wikipedia)

Giorgio TavernitiLet’s suppose we are in a town of 100,000 inhabitants where everyone meets, during his/her life, only a certain number of people from that town, each of them at a different frequency. And let’s suppose that I am an inhabitant of this strange place and that one day I will meet by chance a good friend of mine called Giorgio.

Stefano GorgoniIt’s no neews, on the contrary, I have to say that I do meet this friend of mine very often, almost once a day to be precise. But today something has happened that does not happen very often: we met in front of a bar at 8am and there was no-one there. So we decided to go to Stefano’s place, a friend of ours, who lives not far away from this bar. I do not see him very often, but he does know him very well, he has known him for a couple of years and he says they meet up at least five times a week. Now that I’m thinking about it I have never happened to meet Stefano alone, the few times I’ve seen him Giorgio was almost always with us.

Stuart DeltaAnyway, we get to his place in a couple of minutes and ring the intercom, we are told to wait a little, Stefano is getting ready; also, he says we have to go and see a certain Stuart Delta, a friend of his who has just called a few minutes before, telling him he’s moving and asking whether he could help him. Stefano says hello, he says he will be ready right away and I ask Giorgio who this Stuart is. He replies that he has just got to know him, adding that he is a very good friend of Stefano’s, and they are almost always together and that the three of them go out every saturday night: he’s the person Giorgio sees the most when he goes out with Stefano.
But here comes Stefano, he says he’s sorry for taking so much time. We are told that Stuart lives just a bit further on and that it would be nice of us to help him moving.

Enrico AltavillaWe get to his place and a few minutes later Enrico, a very good friend of Stefano’s and Stuart’s, joins us: he has come to help. To my great surprise I find out that he is an old friend of mine and I tell him I was not expecting to see him there. He says “Hi” and asks me how come I am at Stuart’s and then, without waiting for a reply, turns around and shakes hands with Giorgio, introducing himself.

Markov chains base on one or more starting parameters (Giorgio and Me) to obtain one or more final parameters according to the probability the starting parameters are connected to the final parameter (Stefano): I do not know Stefano very well, Giorgio often sees him, I almost always meet him when Giorgio is with us: Me -/ Giorgio -/ Stefano.

Giorgio does not know Stuart very well, Stefano often sees him, that’s why he and Giorgio are meeting Stuart today, because, as I said, he is the person they see the most when they are together, so: Giorgio -/ Stefano -/ Stuart.

And this is why I have met Stuart today, because: Me -/ Giorgio -/ Stefano -/ Stuart.

And, finally, the fact that I know Enrico is not decisive for him coming to Stuart’s place. He actually did not even take too much notice of me. Giorgio had never met him, anyway.

Jacopo GonzalesAnd yet I might meet Jacopo tomorrow: a friend of mine, of Giorgio’s and Stefano’s. We might meet him just because we are together (Giorgio, Stefano and Me). Or maybe one day we decide to meet him, only Giorgio and Me. Or maybe I will meet him alone.

With a Markov chain, it is not necessary to have a certain number of starting parameters to obtain a final parameter. The final parameter, though, is likely to be generated by the starting paramenters, and the probability depends on these.

Just don’t worry if it’s not all clear, I will give more examples and as we go further into the issue you wil find out why it is interesting both for the Black Hat SEOs and for the White Hat SEOs.

Markov Chains GraphLet’s now suppose we have a start set (it was the town in the above example) where the two following cases are established: an ordered relationship among the elements (in the above example it was: Giorgio and Me; Stefano and Me, Enrico and Me; Stuart and Me; Giorgio and Me; Giorgio and Stefano; Giorgio and Stuart; Giorgio and Enrico; ..), and an ordered relationship between an ordered subset of elements and one element (In the town example it was: [Me and Giorgio] -/ Stefano is different from [Giorgio and Me] -/ Stefano).

Also, let’s suppose that every kind of relationship is coupled with a frequency ranging from 0 to 1.

Example (forget about Me, Stefano, Giorgio and so on.. let’s now use letters only and let’s define a new type of relationship):

  1. I -/ G = 1/2;
  2. G -/ I = 1/3;
  3. I -/ S = 1/7;
  4. S -/ G = 1/4;
  5. G -/ S = 1/6;
  6. [G -/ I] -/ S = 1/20;
  7. [I -/ G] -/ S = 1/10

As you can see now, unlike in the town example, frequency is based on the elements order, too. But don’t worry, let’s get rid of letters and give an example with words.

Percentage GraphLet’s suppose we have a text document dealing with dogs. This document is our set, whereas the words composing it are the elements.

Each word in the document will be followed by another with a certain frequency (number of cases of the ordered couple divided by the number of cases of the first word). For example, the word the is followed in the text by the word dog with a frequency of 1/20, as there are 200 cases of the word the and 10 of the ordered couple the dog. I’ll make clear what I mean with ordered couple for those who haven’t quite understood it: the dog is a couple formed by the and by dog, but it is different from the couple dog the: in fact, the couple the dog can have a different frequency compared to the couple dog the.

Also, in our document every ordered couple of words is followed by another couple with a certain frequency, for example in our document only these words follow the ordered couple the dog:

  1. is
  2. white
  3. Snoopy

SnoopyThe word is follows the couple the dog with a frequency of 1/2; in fact, there are 5 cases of the ordered triple the dog is in the document, out of a total of 10 cases of the couple the dog. The word white follows the couple the dog with a frequency of 2/5 and the word Snoopy follows the couple the dog with a frequency of 1/10.

Let’s suppose we know the frequency linking every couple of the document to each word of the document; we are talking about an ordered type of link. We will use this information in an array and will define a start couple, randomly chosen among all couples.

If we have a big array available we can get a fairly long and correct text, even if it’s nonsense to us humans.

Let’s suppose our start couple is the dog.

If we look for it in the array we will find out that it can be followed by 3 words (is; white; Snoopy), each of them with a certain frequency. One of these word is chosen at random (is) and we continue.

Black Hat & White HatWe now have the couple dog is, we look for it in the array and find out that it can be followed by 2 words (white, black). One of these two words is chosen randomly, based on the frequency (the word white is chosen) and we keep on with looking for the couple is white in the array …

Clearly, the text produced will be getting to a dead end in the following cases only:

  1. look in the array for a couple not associated with any other word (if it exists, there is only one for each array)
  2. look in the array for a couple A associated with a single word only and the second word in the couple forms a couple associated with one word only, and so on until you either get to a couple not associated with any word, or to a couple among those that necessarily follow couple A. In this second case, we are basically creating a loop.

White Hat SEOOk.. I have now explained to you what a Markov chain is, or at least I have explained them to you from the point of view of random text production. At this point spammers can stop reading. This is what my friends White SEOs might be interested in:

The question to be asked is: why do spammers use these texts? Because they are lazy? Or just because they want to reach many rankings right away and for many keywords without having to write the contents?
Sure! 99% do it for that reason, but the rest 1% is wondering: why do search engines “like” these texts?

My idea is that these texts, if they are produced correctly, can interpret the trend of a SERP both from the point of view of the topic and of the language. This way, they climb up the very SERP, because search engines consider them appropriate.

This kind of pages may be considered as a good feeler to roughly comprehend what is the trend of a SERP and to understand how to create clean pages based on the results of the Markov pages.

Ranking processes most likely rely on the mathematical/geometric idea of similitude. Therefore a search engine will consider appropriate a page that is similar to all the other pages that have been considered interesting inside a certain SERP and a page that is, at the same time, different from each one of these pages, as much as the copyproof filters are overcome.

It can be deduced that Search Engines Trend can actually exist, at least as an ideal model a web page tends to, determining its ranking in a certain SERP.