Semantic core - how to compose it correctly? How to collect and ungroup a semantic core: complete instructions Correct semantic core example.

💖 Do you like it? Share the link with your friends

The semantic core of a site consists of keywords (queries) that users use on the Internet to search for services, products and any other information that this site offers. For webmasters, this is an action plan for promoting the resource. In an ideal plan semantic core The site is created once, before optimization and promotion begin.


The semantic core of a website is usually compiled in several stages:

  1. All sorts of words (phrases) that are appropriate to the topic of the site are selected. At first, you can limit yourself to 100–200 search queries. In order to know which queries are suitable for you, answer yourself the question “What do I want to dedicate my site to?”
  2. Expansion of the semantic core through associative queries
  3. Inappropriate words should be eliminated. Here you filter out those phrases that you will not use to promote your site. There are usually more than half of such words.
  4. Highly competitive queries for which there is no point in promoting the site are eliminated. Typically, three words out of five or more are removed.
  5. And lastly, this is the correct distribution of the list of search queries on the resource pages. It is recommended to leave highly competitive queries on the main page of the resource; less competitive ones should be grouped according to their meaning and placed on other pages. To do this, you need to create a document in Excel and break down the keywords into pages.

Selection of search queries and checking frequency

The first thing you need to do is collect as many different queries as possible on your topic that interest users on the Internet. There are two methods for this:

  • Free ones, which include: Wordstat Yandex, Slovoeb, the old-fashioned way, hints from Google (External Keyword Tool), analysis of the semantics of competitors and search tips.
  • Paid ones that include Key Collector, Semrush, Pastukhov databases and some other services.

These tools are suitable for various purposes (for example, Semrush is best used for burzhunet). Of course, all this can be entrusted to optimizers, but there is a possibility that you will receive an incomplete semantic core.

Many people use Pastukhov’s database to collect key phrases, but with Key Collector it is much more convenient to collect queries from Yandex and Google statistics services.

At the initial stage, it is better to collect queries in Excel; it looks like this:


If Google is more important for your resource, then focus on it, but also take into account and analyze keywords from Yandex. It is also very important to collect a long tail of low-frequency queries; they will get you traffic much faster.

Another option you can use is to find out key phrases (words) from your competitors and use them. At this stage, you simply collect as many key phrases (words) that are relevant to the topic of your resource as possible, and then move on to the next stage - filtering.

Analysis of requests, removal of dummies

This stage is already simpler; here you need to filter out dummy words and those that are not related to the theme of the site. For example, you have lunch delivery in Kyiv, but there are other cities on the list.

How to identify empty requests? Go to Yandex Wordstat and enter the keyword:


You see 881 impressions per month, but to be more precise:


Now a completely different picture emerges. This may not be the best example, but the main thing is that you get the idea. There are a large number of key phrases for which sufficient traffic is visible, although in reality there is nothing there. That's why you need to weed out such phrases.

For example, if a person before (or after) typing the request “lunch delivery” entered another phrase in the search bar (called one search session), then Yandex makes the assumption that these search phrases are somehow interconnected. If such a relationship is observed among several people, then such associative queries are shown in the right column of wordstat.


Such search queries are sorted in the wordstat window in descending order of frequency of their entry in conjunction with the main query this month (the frequency of their use in the Yandex search engine is shown). You need to use this information to expand the semantic core of your resource.

Distribution of requests across pages

After this, you need to distribute the keywords (phrases) you collected on the pages of your site. Distribution is much easier when you don’t yet have the pages themselves.

Focus primarily on keywords in search queries and their frequency. This is how you deal with competition: home page allocate to one or two highly competitive queries.

For moderately competitive or low competitive queries, optimize section and article pages accordingly.

If there is semantic similarity in the search queries, simply collect the same phrases and define them in one group. When compiling keywords To promote a resource, always use not only standard tools, but also a creative approach.

By combining non-standard and classical methods, you can simply and quickly create the semantic core of a site, choose the most optimal promotion strategy and achieve success much faster!

Quick navigation on this page:

Like almost all other webmasters, I create a semantic core using the KeyCollector program - this is certainly best program to compile a semantic core. How to use it is a topic for a separate article, although the Internet is full of information on this subject - I recommend, for example, the manual from Dmitry Sidash (sidash.ru).

Since the question was asked about an example of compiling a core, I will give an example.

List of keys

Let's say our site is dedicated to British cats. I enter the phrase “British cat” into the “List of phrases” and click on the “Parse” button.

I get a long list of phrases that will begin with the following phrases (the phrase and particularity are given):

British cats 75553 British cats photo 12421 British fold cat 7273 British cat nursery 5545 British breed cats 4763 British shorthair cat 3571 colors of British cats 3474 British cats price 2461 blue British cat 2302 British fold cat photo 2224 mating of British cats 1888 British cats character 1394 I will buy a British cat cat 1179 British cats buy 1179 long-haired British cat 1083 pregnancy of a British cat 974 British chinchilla cat 969 cats of the British breed photo 953 nursery of British cats Moscow 886 color of British cats photo 882 British cats care 855 British shorthair cat photo 840 Scottish and British cats 763 names of British cats 762 British blue cat photo 723 British blue cat photo 723 British black cat 699 what to feed British cats 678

The list itself is much longer; I have only given the beginning.

Key grouping

Based on this list, on my website there will be articles about types of cats (loose-eared, blue, short-haired, long-haired), there will be an article about the pregnancy of these animals, about what to feed them, about names, and so on on the list.

For each article, one main such request is taken (= topic of the article). However, the article is not limited to just one query - it also adds other relevant queries, as well as different variations and word forms of the main query, which can be found in Key Collector below the list.

For example, with the word “fold-eared” there are the following keys:

British fold cat 7273 British fold cat photo 2224 British fold cat price 513 cat breed British fold 418 British blue fold cat 224 Scottish fold and British cats 190 British fold cats photo 169 British fold cat photo price 160 british fold cat buy 156 british fold blue cat photo 129 British Fold cats character 112 British Fold cat care 112 mating of British Fold cats 98 British shorthair Fold cat 83 color of British Fold cats 79

To avoid overspam (and overspam can also occur due to the combination of using too many keys in the text, in the title, in, etc.), I would not take them all with the inclusion of the main query, but individual words from them make sense use in the article (photo, buy, character, care, etc.) so that the article is better ranked for a large number of low-frequency queries.

Thus, under the article about fold-eared cats, we will form a group of keywords that we will use in the article. Groups of keywords for other articles will be formed in the same way - this is the answer to the question of how to create the semantic core of the site.

Frequency and competition

There is also an important point related to the exact frequency and competition - they must be collected in Key Collector. To do this, you need to tick all requests and on the “Yandex.Wordstat Frequencies” tab click the “Collect frequencies “!” — the exact frequency of each phrase will be shown (i.e. with exactly this word order and in this case), this is a much more accurate indicator than the overall frequency.

To check the competition in the same Key Collector, you need to click the “Get data for Yandex” (or for Google), then click “Calculate KEI using available data.” As a result, the program will collect how many main pages for a given request are in the TOP 10 (the more, the harder it is to get there) and how many pages in the TOP 10 contain such a title (similarly, the more, the harder it is to break into the top).

Next we need to act based on what our strategy is. If we want to create a comprehensive site about cats, then the exact frequency and competition are not so important to us. If we only need to publish a few articles, then we take requests that have the highest frequency and at the same time the lowest competition, and write articles based on them.

If you know the pain of search engines’ “dislike” for the pages of your online store, read this article. I will talk about the path to increasing the visibility of a site, or more precisely, about its first stage - collecting keywords and compiling a semantic core. About the algorithm for its creation and the tools that are used.

Order the collection of the semantic core from SEO specialists of the Netpeak agency:

Why create a semantic core?

To increase the visibility of site pages. Make sure that Yandex and Google search robots begin to find pages of your site based on user requests. Of course, collecting keywords (compiling semantics) is the first step towards this goal. Next, a conditional “skeleton” is sketched out to distribute keywords among different landing pages. And then articles/meta tags are written and implemented.

By the way, on the Internet you can find many definitions of the semantic core.

1. “The semantic core is an ordered set of search words, their morphological forms and phrases that most accurately characterize the type of activity, product or service offered by the site.” Wikipedia.

To collect competitor semantics in Serpstat, enter one of the key queries, select a region, click “Search” and go to the “Key phrase analysis” category. Then select “SEO Analysis” and click “Phrase Selection”. Export results:

2.3. We use Key Collector/Slovoeb to create a semantic core

If you need to create a semantic core for a large online store, you cannot do without Key Collector. But if you are a beginner, then it is more convenient to use a free tool - Sloboeb (don’t let this name scare you). Download the program, and in the Yandex.Direct settings, specify the login and password for your Yandex.Mail:
Create a new project. In the “Data” tab, select the “Add phrases” function. Select your region and enter the requests you received earlier:
Advice: create a separate project for each new domain, and create a separate group for each category/landing page. For example: Now collect semantics from Yandex.Wordstat. Open the “Data collection” tab – “Batch collection of words from the left column of Yandex.Wordstat”. In the window that opens, select the checkbox “Do not add phrases if they are already in any other groups.” Enter a few of the most popular (high-frequency) phrases among users and click “Start collecting”:

By the way, for large projects in Key Collector you can collect statistics from competitor analysis services SEMrush, SpyWords, Serpstat (ex. Prodvigator) and other additional sources.

The semantic core is the basis for website promotion on the Internet. Without it, it will not be possible to bring the site to the top for a long time. We will tell you what it is made of, where to look for it, and what tools to use for this.

What is the semantic core

To simplify understanding, let's assume that the semantic core (SC) is all those words, phrases and their variations that fully describe the content of your site. The more accurate and better the core is assembled, the easier it is to promote the site.

Roughly speaking, this is one big long list of words and phrases (keywords) by which users search for similar products and services. There are no general recommendations for core size, but there is one rule: the larger and better quality, the better. The main thing is not to artificially inflate the size to make the core larger. If you chase size at the expense of quality, all the work will go down the drain - the core will not work.

Let's give an analogy. Imagine that you are the head of a large construction company who needs to build a lot of objects in a short period of time. You have an unlimited budget, but you need to hire at least a hundred people - this is the demand of the trade union. What hundred people will you hire for such a responsible job - any people at all, or will you carefully select them, since the budget allows? But whoever you recruit, build houses with them. It is reasonable to assume that you will choose carefully, because the result depends on it.

It's the same with the core. For it to work even at the initial level, it would be good if it had at least a hundred keys. And if you put anything into the kernel, as long as there is more, the result is guaranteed to be a failure.

General rules for constructing a semantic core

One request - one page. You need to understand which one page you need to send the user to for each request. You cannot make it so that there are several pages per request: internal competition arises and the quality of promotion drops sharply.

The user receives predictable content based on their request. If a customer is looking for shipping options in their area, don't send them to the site's home page if it's not there. Sometimes it happens that after compiling the core, it becomes clear that you need to create new pages for search queries. This is normal and common practice.

The core contains all types of requests (HF, MF, LF). About frequency - below. Just keep this rule in mind as you read the material further. Simply put, you should distribute these requests to specific pages on your site.

An example of a core distribution table for site pages.

Methods for collecting kernels

Wrong: copy from competitors

A method when there is no time and money, but you need to somehow assemble the kernel. We find several of our direct competitors, the cooler the better, and then use, for example, spywords.ru to get a list of keys. We do this with everyone, combine requests, throw out duplicates - and get a base from which we can at least somehow build on.

The disadvantages of this approach are obvious: it’s not a fact that you need to promote for the same queries; parsing and putting such a kernel in order can take a lot of time.

Sometimes it happens that even identical competitors have their own specific requests that they take into account, but you do not. Or they focus on one thing, but you don’t do it at all - the keys work in vain and reduce the rating.

On the other hand, to bring such a base back to normal, it takes a lot of time, effort, and sometimes money to pay for such work. When you start to consider the economics (and this should always be done in marketing), you often realize that the costs of creating your core from scratch will be the same, or even less.

We do not recommend using this method, unless you have a complete disaster with the project and need to somehow get started. Anyway, after launch you will have to redo almost everything, and the work will be useless.

That's right: make the semantic core yourself from scratch

To do this, we fully study the site, understand what audience we want to attract, with what problems, requirements and questions. We think about how they will search for us, relate this to the target audience, and adjust goals if necessary.

This kind of work takes a lot of time, and it’s impossible to do everything in a day. In our experience, the minimum time to assemble a kernel is a week, provided that the person will work full time only on this project. Remember that the semantic core is the foundation of promotion. The more accurately we compile it, the easier it will be at all other stages.

There is one danger that beginners forget about. The semantic core is not something that is done once and for a lifetime. We are constantly working on it, business, queries and keywords are changing. Something disappears, something becomes outdated, and all this needs to be immediately reflected in the kernel. This does not mean that you can do it poorly at first, since then you still have to finish it. This means that the more accurate the kernel, the faster you can make changes to it.

Such work is initially expensive, even within the company (if you do not order FL from an external company), because it requires qualifications, an understanding of how the search works, and complete immersion in the project. The core cannot be done in free time; it should become the main task of the employee or department.

Search frequency shows how often a word or phrase is searched for per month. There are no formal criteria for dividing by frequency; it all depends on the industry and profile.

For example, the phrase “buy a phone on credit” has 7,764 requests per month. For the phone market, this is a mid-frequency request. There is something that is asked much more often: “buy a phone” - more than a million requests, a high-frequency request. And there is something that is asked much less frequently: “buy a phone on credit via the Internet” - only 584 requests, low frequency.

And the phrase “buy a drilling rig” has only 577 impressions, but is considered a high-frequency query. This is even less than the low-frequency query from our previous example. Why is this so?

The fact is that the market for telephones and drilling rigs in unit measurement differs thousands of times. And quantity potential clients differs just as much. Therefore, what is a lot for some is very little for others. You always need to look at the market size and know approximately the total number of potential clients in the region where you work.

Dividing requests by relative frequency per month

High frequency. They need to be included in the meta tags of each page of the site in general, and used for general website promotion. It is extremely difficult to compete in HF requests; it’s easier to just be “in trend” - it’s free. In any case, include them in the kernel.

Mid-frequency. These are the same high-frequency ones, but formulated a little more precisely. They are not subject to such fierce competition in the contextual advertising block as with HF, so they can already be used for promotion for money, if the budget allows. Such queries can already lead targeted traffic to your site.

Low frequency. The workhorse of promotion. It is low-frequency queries that provide the bulk of traffic when configured correctly. You can freely advertise on them, optimize site pages for them, or even create new ones if you cannot do without it. A good syntax consists of approximately 3/4 such queries and is constantly expanding due to them.

Ultra-low frequency. The rarest, but most specific requests, for example, “buy a phone at night in Tver on credit.” Rarely does anyone work with them when compiling, so there is practically no competition. They have a minus - they are really rarely asked, and they take as much time as the others. Therefore, it makes sense to do them when all the main work has already been done.

Types of requests depending on purpose

Informational. They are used to learn something new or gain information on a topic. For example: “how to choose a banquet hall” or “what types of laptops are there.” All such requests should lead to information sections: blog, news or collections on topics. If you see that there are a lot of requests for information, but there is nothing to close them on the site, then this is a reason to create new sections, pages or articles.

Transactional. Transaction = action. Buy, sell, exchange, receive, deliver, order and so on. Most often, such requests are closed by pages of specific products or services. If most of your transactional questions are high- or medium-frequency, reduce the frequency and clarify your queries. This will allow you to accurately target people to required pages, and not leave them on the main page without specifics.

Other. Requests without a clear intent or action. “Beautiful balls” or “modeling crafts from clay” - it is impossible to say about them specifically why the person asked this. Maybe he wants to buy it. Or learn the technology. Or read more about how to do this. Or he needs someone to do it for him. Not clear. Such requests need to be handled carefully and thoroughly cleared of junk keys.

To promote a commercial website, you should mainly use transactional queries, and avoid informational ones - for them, the search engine shows information portals, Wikipedia, and aggregator sites. And it’s almost impossible to compete with them in terms of promotion.

Garbage keys

Sometimes queries include words or phrases that are not relevant to your industry or that you simply are not involved in. For example, if you only make souvenirs from softwood, you probably don't need the search term "bamboo souvenirs." It turns out that “bamboo” is a garbage element that clogs the core and interferes with the purity of the search.

We collect such keys in a separate list; they will be useful to us for contextual advertising. We indicate them as something that does not need to be searched, and then for the request “pine souvenirs” our site will be in the search results, but for the request “bamboo souvenirs” it will not.

We do the same throughout the core - we find something that does not belong to the profile, remove it from the SYA and add it to a separate list.

Each request consists of three parts: a specifier, a body, and a tail.

The general principle is this: the body specifies the subject of the search, the specifier specifies what needs to be done with this subject, and the tail specifies the entire request.

By combining different specifiers and tails for queries, you can get many keywords that suit you, which will be included in the core.

Step by step building the kernel from scratch

The very first thing you can do is look through all the pages of your website and write down all the product names and stable phrases of product groups. To do this, look at the headings of categories, sections and main characteristics. We will record everything in Excel, it will be useful in the next stages.

For example, if we have a stationery store, we get the following:

Then we add characteristics to each request - we build up the “tail”. To do this, we find out what properties these products have, what else can be said about them, and write them down in a separate column:

After this, we add “specifiers”: action verbs that relate to our topic. If, for example, you have a store, then this will be “buy”, “order”, “in stock” and so on.

We collect individual phrases from this in Excel:

Collecting extensions

Let's look at three typical tools for collecting the kernel - two free and a paid one.

Free. We type our phrase into it and get a list of things that are similar to our request. We look at it carefully and choose what suits us. So we run through everything that we got at the first stage. The work is long and tedious.

As a result, you will have a semantic core that most accurately reflects the content of your site. You can now fully work with him further as you progress.

When searching for words, focus on the region where you sell your product or service. If you do not work throughout Russia, switch to the “by region” mode (immediately below the search bar). This will allow you to get an accurate picture of requests in the place you need.

Consider your request history. Demand is not static, which many people forget. For example, if at the end of January you search for the query “buy flowers,” it may seem that almost no one is interested in flowers - there are only a hundred or two queries. But if you search for the same thing in early March, the picture is completely different: thousands of users are looking for this. Therefore, remember about seasonality.

Also free, it helps you find and select keywords, predict queries and provides statistics on effectiveness.

Key Collector. The program is a real harvester that can do 90% of all the work of collecting the semantic core. But the paid one is almost 2000 rubles. Searches for keys from many sources, looks at ratings and queries, and collects analytics on the core.

Main features of the program:

collection of key phrases;

determining the cost and value of phrases;

identifying relevant pages;

Everything it can do can be done for free using several free analogues, but it will take many times longer. Automation is the strong point of this program.

As a result, you receive not only a semantic core, but also full analytics and recommendations for improvement.

Removing trash keys

Now we need to clean up our kernel to make it even more efficient. To do this, we use Key Collector (it will do it automatically), or we look for garbage manually in Excel. At this stage, we will need the list of unnecessary, harmful or superfluous requests that we compiled earlier

Removal of garbage and keys can be automated

Grouping requests

Now, after collecting, all found queries need to be grouped. This is done so that keywords that are close to each other in meaning are assigned to one page, and not blurred across different ones.

To do this, we combine queries that are similar in meaning, the answers to which are given to us by the same page, and write next to where they belong. If there is no such page, but there are a lot of requests in the group, it most likely makes sense to create new page or even a section on the website where you can send everyone for such requests.

An example of grouping, again, can be seen in our worksheet.

Use every automation program you can get your hands on. This saves a lot of time on building the kernel.

Do not combine informational and transactional requests on one page.

The more low-frequency queries in the texts, the better. But don’t get carried away, don’t turn the text on the site into something understandable only by a robot. Remember that real people will read you too.

Periodically clean and update the kernel. Make sure that the information in the semantic core is always up to date and reflects the current situation. Otherwise, you will spend money on something that you cannot ultimately give to your clients.

Remember the benefits. In pursuit of search traffic, do not forget that people come from different sources, but stay where they are interested. If your core is always up to date and the text on the pages is written in human, understandable and interesting language, you are doing everything right.

Finally, once again the kernel construction algorithm itself:

1. find all keywords and phrases

2. clean them of junk requests

3. We group requests according to their meaning and compare them with the pages of the site.

Do you want to start promoting your website, but you understand that collecting the semantic core takes a long time? Or don’t want to understand all the nuances, but just get the result? Write to , and we will select for you the best option for promoting your website.

(11 )

In this post we will describe the complete algorithm for collecting the semantic core primarily for an informational site, but this approach can also be used for commercial sites.

Initial semantics and creation of the site structure

Preparation of words for parsing and the initial structure of the site

Before we start parsing words, we need to know them. Therefore, we need to create the initial structure of our site and the initial words for parsing (they are also called markers).

You can see the original structure and words:

1. Using logic, words from your head (if you understand the topic).
2. From your competitors, whom you analyzed when choosing niches or by entering your main request.
3. From Wikipedia. Usually it looks like this:

4. Look at wordstat for your main queries and the right column.
5. Other thematic books and reference books.

For example, the topic of our website is heart disease. It is clear that we must have all heart diseases in our structure.

You cannot do without a medical reference book. I would not look at competitors, because they may not have all diseases represented; most likely, they did not have time to cover them.

And your initial words for parsing will be all heart diseases, and based on the keys that we parse, you will build the structure of the site when you start grouping them.

In addition, you can take all the drugs for treating the heart, as an extension of the topic, etc. You look at Wikipedia, categories on competitors’ websites, wordstat, think logically and in this way find more marker words that you will parse.

Site structure

You can look at competitors for general information, but you don’t always have to make a structure like theirs. You should proceed largely from the logic of your target audience; they also enter the queries that you parse from search engines.

For example, what to do? List all heart diseases, and then identify symptoms and treatment for them. Or do we create categories for symptoms, treatment, and then manage diseases from them. These issues are usually resolved by grouping keywords based on search engine data. But not always, sometimes you will have to make a choice yourself and decide how to make the best structure, because requests may overlap.

You must always remember that the structure is created throughout the collection of semantics and sometimes in its initial form it consists of several headings, and with further grouping and collection it expands, as you begin to see queries and logic. And sometimes you can compose it without having to worry about keywords right away, because you know the topic well or it is well presented by competitors. There is no system for creating the structure of the site, you can say this is your personal creativity.

The structure can be yours (different from competitors), but it must be convenient for people, meet their logic, and therefore the logic of search engines, and such that you can cover all thematic words in your niche. It should be the best and most convenient!

Think ahead. It happens that you take a niche, and then you want to expand it, and you begin to change the structure of the entire site. And the created structure on the site is very difficult and tedious to change. Ideally, you will need to change the attachment URLs and re-paste it all on the site itself. In short, this is such a tedious and very responsible job, so immediately decide definitively, like a man, what and how you should do it!

If you are very new to the topic of the site you are creating and do not know how the structure will be built, you do not know which initial words to use for parsing, then you can swap stages 1 and 2 of collection. That is, first parse competitors (we’ll look at how to parse them below), look at their keys, based on this create a structure and initial words for parsing, and then parse wordstat, hints, etc.

To create the structure, I use the mind manager - Xmind. It's free and has everything basic.

A simple structure looks like this:


This is the structure of a commercial website. Typically, information sites do not have intersections or any filters for product cards. But this structure is not complicated, it was compiled for the client so that he would understand. Usually my structures consist of many arrows and intersections, comments - only I myself can understand such a structure.

Is it possible to create semantics as the site is being filled out?

If the semantics are easy, you are confident in the topic and know it, then you can do semantics in parallel with filling the site. But the initial structure must be laid out. I myself sometimes practice this in very narrow niches or in very wide ones, so as not to spend a lot of time collecting semantics, but to launch a site right away, but I still wouldn’t recommend doing this. The likelihood of mistakes is very high if you do not have experience. Still, it’s easier when all the semantics are ready, the whole structure is ready and everything is ungrouped and understandable. In addition, in the finished semantics you can see which keywords should be given priority attention, which do not have competition and will bring more visitors.

Here you also need to take into account the size of the site; if the niche is wide, then there is no point in collecting semantics, it is better to do it as you go, because collecting semantics can take a month or more.

So, we initially sketched out the structure or didn’t sketch it out, we decided to go with the second stage. We have a list of initial words or phrases of our topic that we can start parsing.

Parsing and working in keycollector

For parsing, of course, I use keycollector . I will not dwell on setting up keycollectora, you can read the help of this program or find articles on setting up on the Internet, there are a lot of them and everything is described in detail there.

When choosing parsing sources, you should calculate your labor costs and their effectiveness. For example, if you parse Pastukhov’s database or MOAB, then you will be buried in a bunch of garbage requests that will need to be sifted out, and this takes time. And in my opinion, it’s not worth it to find a couple of requests. There is a very interesting study on the topic of databases from RushAnalytics, of course they praise themselves there, but if you don’t pay attention to this, there are very interesting data on the percentage of bad keywords http://www.rush-analytics.ru/blog/analytica-istochnikov -semantics

At the first stage, I search wordstat, adwords, their tips and use the Bukvarix keyword database (the desktop version is free). I also used to look through tips from Youtube manually. But recently keycollector added the ability to parse them, and it's great. If you are a complete pervert, you can add other keyword databases here.

You start parsing and off you go.

Cleaning the semantic core for an information site

We parsed the queries and came up with a list of different words. Of course, it contains the necessary words, as well as garbage ones - empty, not thematic, not relevant, etc. Therefore they need to be cleaned.

I don’t delete unnecessary words, but move them into groups because:

  1. They can later become food for thought and become relevant.
  2. We exclude the possibility of accidentally deleting words.
  3. When parsing or adding new phrases, they will not be added if checked.


I sometimes forgot to set it, so I set up parsing in one group and parse keys only in it, so that the collection is not duplicated:


You can work this way or that way, as you wish.

Collection of frequencies

We collect from all words through direct, base frequency [W] and exact frequency [“!W”].


We collect everything that is not collected via wordstat.

Cleaning one-word words and not formatting

We filter by single words, look at them and remove unnecessary ones. There are some one-word queries that make no sense to move forward; they are not clear-cut or duplicate another one-word query.


For example, our topic is heart disease. There is no point in promoting the word “heart”; it is not clear what the person means - this is too broad and ambiguous a request.

We also look at which words have no frequency collected - either the words contain special characters, or there are more than 7 words in the query. We transfer them to a non-format. It is unlikely that such queries are entered by people.

Cleaning by general and exact frequency

All words with a total frequency [W] from 0 to 1 are removed.

I also remove everything from 0 to 1 by exact frequency [”!W”].

I separate them into different groups.

In the future, normal logical keywords can be found in these words. If the core is small, then you can immediately manually review all the words with zero frequency and leave those that you think people are entering. This will help to cover the topic completely and perhaps people will click on these words. But naturally these words should be used last, because according to them high traffic definitely won't.

The value from 0 to 1 is also taken based on the topic; if there are a lot of keywords, then you can filter from 0 to 10. That is, it all depends on the breadth of your topic and your preferences.

Full coverage cleaning

The theory here is this: for example, there is a word - “forum”, its base frequency is 8,136,416, and the exact frequency is 24,377, as we see the difference is more than 300 times. Therefore it can be assumed that this request empty, it includes a lot of tails.

Therefore, by all accounts, I calculate the following KEI:

Accurate Frequency / Base Frequency * 100% = Completeness of Coverage

The lower the percentage, the more likely it is that the word is empty.

In KeyCollector this formula looks like this:

YandexWordstatQuotePointFreq / (YandexWordstatBaseFreq+0.01) * 100

Here, too, everything depends on the topic and the number of phrases in the core, so you can reduce the completeness of coverage to less than 5%. And where the core is large, you can not take even 10-30%.

Cleaning up implicit duplicates

To clean up implicit duplicates, we need to collect Adwords frequency from them and navigate according to it, because it takes into account the order of words. We are saving resources, so we will collect this indicator not from the entire core, but only from duplicates.


In this way, we found and noted all non-obvious duplicates. Close the tab - Analysis of implicit duplicates. They were registered in our working group. Now we will display only them, because the parameters are retrieved only for those phrases that are shown in the group on at the moment. And only then we start parsing.


We are waiting for Adwords to take the indicators and go into the analysis of implicit duplicates.


We set these parameters for the smart group mark and click – perform smart check. In this way, only the highest-frequency Adwords queries will not be marked in our group of duplicates.

It’s better, of course, to go through all the takes and look at them manually, in case something appears wrong there. Pay special attention to groups where there are no frequency indicators; duplicates are noted there by chance.

Everything that you note in the analysis of implicit groups is also noted in the working group. So after completing the analysis, simply close the tab and transfer all the marked implicit duplicates to the appropriate folder.

Cleaning with stop words

I also divide stop words into groups. I list cities separately. They may come in handy in the future if we decide to create a directory of organizations.

Separately, I list words containing the words photo, video. Perhaps they will come in handy someday.

And also, “vital queries”, for example Wikipedia, I include the forum here, and also in the medical topic this can include Malyshev, mosquitoes, etc.

It all depends on the topic. You can also make separate commercial requests - price, buy, store.

This results in a list of groups based on stop words:

Cleaning up twisted words

This applies to competitive topics; competitors often hype them up to mislead you. Therefore, it is necessary to collect seasonality and eliminate all words with a median of 0.

You can also look at the ratio of the base frequency to the average; a large difference may also indicate that the request has been jacked up.

But we must understand that these indicators may also indicate that these are new words for which statistics have only recently appeared or that they are simply seasonal.

Cleaning by geo

Usually, checking by geo is not required for information sites, but just in case, I’ll write this down.

If there are doubts that some of the requests are geo-dependent, then it is better to check this through the Rookee collection; although it sometimes makes mistakes, it is much less common than checking this parameter using Yandex. Then, after collecting Rookee, you should check all the words manually, which were indicated as geo-dependent.

Manual cleaning

Now our core has become several times smaller. We review it manually and remove unnecessary phrases.

At the output we get these groups of our kernel:

Yellow - it’s worth digging around, you can find words for the future.

Orange - may come in handy if we expand the site with new services.

Red - not useful.

Analysis of request competition for information sites

Having collected the requests and cleaned them, we now need to check their competition in order to understand in the future which requests should be dealt with first.

Competition based on the number of documents, titles, main pages

This can all be easily done via KEI in KeyCollector.


We receive data for each request, how many documents were found in search engine, in our example in Yandex. How many main pages are in the search results for this query and occurrences of the query in the title.

On the Internet you can find various formulas for calculating these indicators, even a freshly installed KeyCollector seems to have some kind of formula for calculating KEI built into the standard. But I don’t follow them, because you need to understand that each of these factors has different weight. For example, the most important thing is the presence of main pages in the search results, then the headings and the number of documents. It is unlikely that this importance of factors can somehow be taken into account in the formula, and if it is still possible, then you cannot do without a mathematician, but then this formula will not be able to fit into the capabilities of KeyCollector.

Competition on link exchanges

This is where it gets more interesting. Each exchange has its own algorithms for calculating competition and it can be assumed that they take into account not only the presence of main pages in the search results, but also the age of the pages, link mass and other parameters. Basically, these exchanges are, of course, designed for commercial requests, but still more or less some conclusions can be drawn based on information requests.

We collect data on exchanges and display average indicators and then use them as a guide.


I usually collect from 2-3 exchanges. The main thing is that all requests are collected for the same exchanges and the average is displayed only for them. It’s not that some requests were collected by some exchanges, and others by others, and the average was derived.

For a more visual view, you can use the KEI formula, which will show the cost of one visitor based on the parameters of the exchanges:

KEI = AverageBudget / (AverageTraffic +0.01)

The average budget for exchanges divided by the average traffic forecast for exchanges, we get the cost of one visitor based on the exchange data.

Competition for mutagen

It's not in keycollector, but that's not a problem. Without any problems, all words can be uploaded to Excel, and then run through KeyCollector.

What's better than Keyso? It has a larger base compared to its competitors. His language is clean, there are no phrases that are duplicated or written in a different order. For example, you will not find such repeating keys as “type 1 diabetes”, “type 1 diabetes” there.

Keyso can also search sites with one Adsense, Analytics, Leadia, etc. counter. You can see what other sites the owner of the analyzed site has. Yes, and in general when searching for competitors’ sites, I think this is the best solution.

How to work with Keyso?

We take any one site of our competitor, of course more is better, but not particularly critical. Because we will work in two iterations. We enter it into the field. Let's squeeze - analyze.

We receive information on the site, we are interested in competitors here, click open everyone.


All our competitors are opening up.


These are all sites whose keywords at least somehow overlap with our analyzed site. There will be youtube.com, otvet.mail.ru, etc., that is, large portals that write about everything. We don’t need them, we need sites purely on our topic. Therefore, we filter them according to the following criteria.

Similarity is the percentage of common keys out of the total number of a given domain.

Topic content – ​​the number of keys of our analyzed site in the keys of a competitor’s domain.

Therefore, the intersection of these parameters will remove common sites.

Let's set thematicity to 10, similarity to 4 and see what we get.

There were 37 competitors. But we will still check them manually, upload them to Excel and, if necessary, remove unnecessary ones.


Now go to the group report tab and enter all our competitors that we found above. Click – analyze.

We get a list of keywords for all these sites. But we have not yet fully covered the topic. Therefore, we are becoming competitors of the group.

And now we get all the competitors, all those sites that we introduced. There are several times more of them and there are also many general thematic ones. We filter them by similarity, let's say 30.

We get 841 competitors.


Here we can see how many pages this site has, traffic and draw conclusions about which competitor is the most effective.

We export all of them to Excel. We go through our hands and leave only the competitors of our niche, you can mark the most effective comrades, so that you can then evaluate them and look at what features they have on the site, queries that give a lot of traffic.

Now we go back to the group report and add all the competitors that have already been found and get a list of keywords.

Here we can immediately filter the list by “!wordstat” More than 10.


Here they are our queries, now we can add them to the KeyCollector and specify that phrases that are already in any other KeyCollector group are not added.

Now we clean up our keys and expand and group our semantic core.

Semantic core collection services

In this industry you can find quite a few organizations that are ready to offer you clustering services. For example, if you are not ready to spend the time to independently study the intricacies of clustering and do it yourself, then you can find many specialists who are ready to do this work.

Yadrex

One of the first on the market to use artificial intelligence to create a semantic core. The head of the company is himself a professional webmaster and SEO technology specialist, so he guarantees the quality of work of his employees.

In addition, you can call the indicated numbers to get answers to all your questions regarding the work.

When ordering services, you will receive a file containing the kernel content groups and its structure. Additionally you get structure in mindmup.

The cost of work varies depending on the volume; the larger the volume of work, the cheaper the cost of one key. The maximum cost for an information project will be 2.9 rubles per key. For the seller 4.9 rubles per key. For large orders, discounts and bonuses are provided.

Conclusion

This completes the creation of the semantic core for the information site.

I advise you to monitor the history of changes to the KeyCollector program, because it is constantly updated with new tools, for example, YouTube was recently added for parsing. With the help of new tools, you can further expand your semantic core.

Tell friends