Magento 2 Fix chinese search terms showing in your dashboard.

Have you spotted Chinese characters in your Dashboard for “Last search terms”, or in your Google webmasters console for inbound links, going to your search page results in Magento 2?

If so, you have two issues.

The reason this happened is due to scrapers from China, USA, and Europe searching your website, scraping content, then, and perhaps worst, they display the content on spammy websites that link back to your search results page which Google crawl and list in their search results.

1. The external links pointing at your website are probably coming from low-quality, unrelated content, designed to penalize your website. Googlebot is crawling those links and ends up on your search results page.
2. As Google crawls the links your search results page will come up as an issue.

Let’s see what this results in.

Here’s the fix.

Step 1.

Look for catalogsearch_result_index.xml which will be located in /vendor/magento/module-catalog-search/view/frontend/layout or possibly within your theme layout
Now add
<head>
<meta name=”robots” content=”NOINDEX,NOFOLLOW”/>
</head>

After
<page xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” layout=”2columns-left” xsi:noNamespaceSchemaLocation=”urn:magento:framework:View/Layout/etc/page_configuration.xsd”>
and before the <body> tag in this page.

When you’ve finished it should look like this

<?xml version=”1.0″?>
<!–
/**
* Copyright © Magento, Inc. All rights reserved.
* See COPYING.txt for license details.
*/
–>
<page xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” layout=”2columns-left” xsi:noNamespaceSchemaLocation=”urn:magento:framework:View/Layout/etc/page_configuration.xsd”>
<head>
<meta name=”robots” content=”NOINDEX,NOFOLLOW”/>
</head>
<body>
<attribute name=”class” value=”page-products”/>

Step 2.

Look for catalogsearch_advanced_result.xml in the same location and repeat.

Now flush the cache and navigate to the frontend, load up the search results page , view page source and you should see
<meta name=”title” content=”Search results for: ” Chinese garbage />
<meta name=”robots” content=”NOINDEX,NOFOLLOW” />

Step 3.

Use phpMyAdmin to clean your website. Open the table “search_query”
Now,  delete the entries with Chinese writing. There were 175,000 on the last one I fixed.

Step 4.

compile a list of domains that are pushing these links at your site,  and disavow them after checking them all manually.
To disavow them create a txt file saved as utf8 and add each domain like this ;

Domain: badsite.com
Domain:badsite2.com

Once you have the list go to https://search.google.com/search-console/disavow-links and upload the file.

Check your backlinks each week using Moz.com or ahrefs.com and repeat the final stage.
This needs to become a habit to protect your website.

Final steps

Add the following to your robots.txt file

User-agent: *
Disallow: /search/*
Disallow: /result/*
/catalogsearch/result/?

You may also need to add

User-agent: Bingbot
Disallow: /result/*
Disallow: /catalogsearch/result/?

You should monitor the IP addresses that these links come from as they will create a few issues in your admin panel

Here are a few to get you started. Use the .htacces file and add the following lines to the top of it.

you will need access to your hosting control panel log files to spot the searches, then add the ip ranges to your .htaccess file as you go.

Order deny,allow
deny from 38.77.197.*
deny from 213.157.187.*
deny from 27.125.240.*
deny from 104.165.187.*
deny from 38.77.197.*
deny from 198.240.80.*
deny from 198.240.80.199
deny from 38.77.197.94
deny from 89.36.220.*
deny from 64.64.102.*
deny from 54.36.148.*

Note you should also setup a cloudflare account and go to the firewall rules settings and add a rule to stop the initial crawler getting content to scrape in the first place. I can’t print the filters in this post or they will change the way they crawl your site, contact me directly for the rules.

This image represents just 20 minutes after creating the rule.