A Googlebot is a search bot used by Google. A web crawler (also known as a web spider, web robot, or—especially in the FOAF community— web scutter) is a program or automated Google Inc is an American public corporation, earning revenue from advertising related to its Internet search, e-mail, online It collects documents from the web to build a searchable index for the Google search engine. A document (noun is a bounded physical representation of body of Information designed with the capacity (and usually intent to Communicate. The World Wide Web (commonly shortened to the Web) is a system of interlinked Hypertext documents accessed via the Internet. Google search is a Web search engine owned by Google Inc, and it is the most used search engine on the Web.
If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[1] or by adding the meta tag <meta name="Googlebot" content="noindex"> to the webpage. The webmaster (feminine webmistress) also called the Web architect, the Web developer, the site author, or the website A web crawler (also known as a web spider, web robot, or—especially in the FOAF community— web scutter) is a program or automated The robot exclusion standard, also known as the Robots Exclusion Protocol or robots [2] Googlebot requests to Web servers are discernible from their user-agent string 'Googlebot'. The term web server can mean one of two things A Computer program that is responsible for accepting HTTP requests from web clients which are A user agent is the client application used with a particular Network protocol; the phrase is most commonly used in reference to those which access the World
Googlebot has two versions, deepbot and freshbot. Deepbot, the deep crawler, tries to follow every link on the web and download as many pages as it can to the Google indexers. It completes this process about once a month. Freshbot crawls the web looking for fresh content. It visits websites that change frequently, according to how frequently they change. A website (alternatively web site or Web site, a back-construction from the Proper noun World Wide Web) is a collection of Web pages Currently Googlebot only follows HREF links and SRC links. In Computing, an HTML element indicates structure in an HTML document and a way of hierarchically arranging content [3]
Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to from another known page on the web in order to be crawled and indexed.
A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. In Computing, a mirror is an exact copy of a Data set On the Internet, a mirror site is an exact copy of another Internet site Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate. [1]