Bots, Crawlers and Spiders, oh my!

From iGeek
Crawlers.png
Bots, crawlers and spiders are an important part of the way search engines work.
The terms bots, crawlers and spiders are likely to give arachnophobes the heebie-jeebies, but they're really just an important part of the way search engines work. The automated critters just go to the front page of a website, and look at every link in that page... then go to each of those and do the same, and so on.
ℹ️ Info          
~ Aristotle Sabouni
Created: 2002-03-11 

Once they've visited every page on a site (crawled across it), they have a pretty good idea by counting most frequent words, of what each article is about, as well as by looking at who links to that page/website, how it should rank in importance.

Basics[edit | edit source]

Most people creating their own websites want their site to be at the top of the search engines response list when someone searches for a particular topic. While that isn't likely, there are things you can do to move up the priority list, and it is good to understand how these things work.

There is something referred to as a bot (robot), crawler or spider. Their purpose is to automatically surf the web.

They go to every site, and try to follow every link they can find; then they crawl all over anything they can find there as well; hence the bug references.

When you register your site with a search engine, that is all you are doing - telling their spider to start crawling your site; they will find you eventually, whether you register or not.

Search engines employ these (none-too-smart) automatons to look for anything new, and to create an "index" of what they find on each site.

They keep a list of topics and key words, and they count them up. If each page on your site has the word "computer" in it, then they can "guesstimate" that your site has something to do with computers. Then when someone searches for the word "computer", they know that your site has something to do with that topic, and you should be in the list of 15,000+ sites that also refer to computers.

SEO[edit | edit source]

The problem is that there are so many sites and pages that have to do with every topic, that they also need to figure out who should be first on the list. They need to figure out popularity and give each site (or page) some relative weight, with the most massive sites showing up higher on the list.

If you want a lot of weight quickly, the search engines will let you rent it (a form of advertising); but most of us don't have the budgets to create that artificial weight, and must do it other ways.

Search engines can do little to figure out true "popularity" and how often people visit; they can't really snoop other people's sites and see who is visiting or how often, so they resort to less direct methods.

On the extreme high end, there is some ability to poll users and figure out where they are going; but you're not likely to show up in those, and we're not talking about creating a site for a company that can measurably effect the nations GDP; just a normal site.

One of the ways that search engines can guess at popularity is to just count links; they look at how many other sites are pointing to a page on your site, then they can rate how "popular" your site is. The more people that point to you, and the bigger they are, then the more valuable your information must be; and the more weight you get. So if you want to show up better on search engines, then you need to make web-friends and link to each other. Advertising banners on other sites (that have weight), don't hurt, since that adds some weight (links and readers); but the banner links stop when you stop advertising.

Another way is just based on how big your site is; if there are many articles on the site (a lot of "content"), that all have articles about the same subject, then you're probably getting visitors on that topic, and others would probably have interest as well.

Since where you rank on search engines can dramatically change the amount of traffic you get, and traffic translates to revenue, everyone wants to know exactly how search engines rank sites, so they can game the system (and trick them into putting their stuff at the top). So search engines tweak their algorithms, in ways they don't share. And others try to scan how results come up, to reverse engineer the important variables for moving up the rank. This cold war is about SEO: Search Engine Optimization and how to game/defeat it.

Addendum[edit | edit source]

When I wrote this, it was pretty easy to make a successful site. The Internet was still hungry for content, so creating good content, which got you good links from others, and you moved up the ranks. Now days, the problem has inverted: there's a ton of content out there... and people tend to not link to outside content as much (trying to keep traffic internal). So it's not just about good content, but promoting yourself to those that will link back to you. And SEO became much bigger business.

GeekPirate.small.png


👁️ See also

  • Bots, Crawlers and Spiders, oh my! - Bots, crawlers and spiders are an important part of the way search engines work.
  • Cookies - You've probably heard someone reference "cookies". What are cookies, how do they work?
  • EMail - The Origins, history and evolution of eMaill, forums and live chat. Here's the basics of email.
  • Network Casting and Subnets - Casting and Subnets are ways to send to many people at once, but not everyone.
  • Never trust the Internet - Think of the internet as "the net of 1,000 lies". Never forget that free does not always mean "correct"!
  • Web Basics - Have you ever wondered how the Web works? The majority of the Internet and computers are actually very simple to understand.
  • Web Search Basics - The basics of searching the web, or how to use Google better.
  • What is DNS? - What and how Domain Names work: how to turn a human readable Domain (like igeek.wiki) into computer readable IP address.
  • What is a WebApp? - What is a Web Application, and how does it vary from a traditional website?


🔗 More

Tech
Technology: Organizations, Reviews, People

Networking
A network is a connection of devices, sometimes cables sometimes wireless, that lets one device talk to others.



Tags: Tech  Networking


Cookies help us deliver our services. By using our services, you agree to our use of cookies.