��תrobotsЭ�飬��ֱر�

��Դ��δ֪ ��ڣ�2017-06-23 | ��ࣺseo

��2013��2��8�ձ��е�һ�м��Ժ��ʽ��˰ٶ��滢360Υ��“RobotsЭ��”ץȡ��վ��ݵĲ��Ϊһ��ߴ�һ��Ԫ��Կ��2012��°��“3B��ս”�ļ��ڴ˴��ⰸ��У��ٶȳ��Լ��Robots�ı��趨��360��룬��360��Ȼ��“�ٶ�֪��”��“�ٶȰٿ�”�Ȱٶ��վ��ݽ��ץȡ��

�� ʵ��2012��11�³��˫��Ħ��Ӿ��й��Э��ǣͷ�£��ٶȡ��ˡ��滢360��ڵ�12�һ��˾�ѹ�ͬǩ��ˡ�� ɹ�Լ��ڹ�Լ��ŵ“��ѭ��ͨ�е��ҵ��ҵ��ػ��Э�飨robotsЭ�飩��

��Ҹ��һҹ��robotsЭ�顣

��ʶrobotsЭ��

��ʲô��robots

��Ҫ �˽�robotsЭ��Ҫ�˽�robots��˵��robots��ǡ�I��robot��·ʷ��˹��ǡ��ܶ�Ա��ޣ�� ս��ߡ�ϵ��е�ʩ��ʲô��Щ��Ӱ�㶼��֪��ðɣ��ˡ��ĵ�robots��ָ��web robots��ֿ��ܺܶ��˲��Ϥ��ᵽWeb Wanderers��Crawlers��Spiders�ܶ��˿��ܾͻ�Ȼ��ˣ��ͳ��Ϊ��棬Ҳ��ץȡ��ҳ�ĳ��

��ͬѧ�Ƕ�֪��ҳ��ͨ��ӻ��ģ��Ӷ��γ��ҳ��״�ṹ��Ĺ��ʽ��֩��ȥ��̿��Լ��£�

��1.ι��һ��url��ǳ�֮Ϊ��ӣ�seeds��2.��ץȡseeds��html��ҳ��ȡ��еĳ��3.��ץȡ��Щ�·��ֵ��ָ��ҳ

��2��3ѭ��

��ʲô��robotsЭ��

��˽��̾��ܿ��˵��վ�ǳ��ֻ��ʵʵ��ץȡ�ķݡ��

��1.ĳЩ·��Ǹ��˽��վ��ʹ�ã��뱻��ץȡ��˵�ձ��鶯��Ƭ2.��ϲ��ĳ��棬��Ը�ⱻ��ץȡ��ľ��֮ǰ�Ա��ϣ��ٶ�ץȡ3.С��վʹ�õ��ǹ��õ��޻��Ҫ��ѣ�ϣ��ץ��4.ĳЩ��ҳ�Ƕ�̬��ɵģ�û��ֱ�ӵ��ָ�򣬵��ϣ��ݱ��ץȡ�� վ��ݵ��վ��Ա��Ӧ��ߵ��Ը��Ϊ��ϵȵȣ��Ҫ�ṩһ��վ��й�ͨ��;��վ��Ա��Լ��Ը�Ļ� �ᡣ��й�Ӧ��robotsЭ��ʹ˵��RobotsЭ�飬ѧ��У�The Robots Exclusion Protocol��ץȡ��վ��ݵķ�Χ��Լ��վ�Ƿ�ϣ��ץȡ��Щ��ݲ��ץȡ��Щ��ݷŵ�һ��ı��ļ� robots.txt�Ȼ��ŵ�վ��ĸ�Ŀ¼�¡��ץȡ��վ��ǰ��ץȡrobots.txt��ݴ�“�Ծ��”ץȡ��߲�ץȡ��ҳ��ݣ��Ŀ�� վ��ݺ��Ϣ��ȷ��û��Ϣ��˽��ַ��

��Ҫע��robotsЭ�鲢��ǹ淶��ֻ��ҵ��һ��Լ��׳ɵ�Э�顣ʲô��˼�أ�RobotsЭ�鲻��ʲô��ݣ��ֻ��һ�ֻ��ص�Э�飬�ñ�˽�һ�԰��ſڹ��“��”��Ƶ��У��Ȼ��Ŷ��룬��˵360��

��˵��ô�࣬��Ӹо�һ��ȣ�

��1��Ա�

��User-agent: Baiduspider

��Disallow: /

��User-agent: baiduspider

��Disallow: /

��Գ��㶮�ġ��ⲻ��Ա��ðٶ�ץȡ��2��

��User-agent: *

��Disallow: /?*

��Disallow: /pop/*.html

��User-agent: EtaoSpider

��Disallow: /

��Ҳ��ӣ��2��Ŀ¼��ϣ��ץ��ͬʱ��etao��ȫ��Ρ�

��淨

��robots.txt��λ��

��˵��Ҳ�򵥣�robots.txt�ŵ�һ��վ��ĸ�Ŀ¼�¼��ɡ�˵��Ҳ�е�С��ӣ�һ��robots.txtֻ�ܿ��ͬЭ�飬��ͬ�˿ڣ��ͬվ��ҳץȡ��ԡ�ʲô��˼�أ��

��ٶ��ҳ��

��ٶ�֪��

��robots.txt��ǲ�ͬ�ģ�Ҳ��˵�ٶ��ҳ��Ͱٶ�֪��ץȡ��Կ��Լ��robots.txt��ƣ��ˮ��ˮ��

��robots.txt��

��򵥵�robots.txtֻ��

��1.User-agent��ָ��Щ��Ч

��2.Disallow��ָ��Ҫ��ε��ַ

��ļ��Ϊx�ڣ�һ��y��User-agent�к�z��Disallow��ɡ�һ�ھͱ�ʾ��User-agent��ָ��y��z��ַ��x>=0��y>0��z>0��x=0ʱ��ʾ��ļ��ļ��ͬ��û��robots.txt��

��ϸ��

��User-agent

��ץȡʱ��Լ��ݣ��User-agent��û��httpЭ��User-agent��robots.txt��User-agent��ָ��档

��˵��google��ҳ��User-agentΪGooglebot��о�ָ��google��档

��User-agent��Googlebot

��ָ��е��ô�죿��ٰ��һ�У�

��User-agent: *

��е�ͬѧҪ��ˣ��ô֪��User-agent��ʲô��ṩ��һ��򵥵��б��б��Ȼ��㻹��Բ��ϵõ��ٷ��ݣ��˵google��б��ٶ��б�Disallow

��Disallow ��г��Ҫ��ص��ҳ��б�� /�� ͷ��г��ض��ַ��ģʽ��

��Ҫ��վ��ʹ��б�߼��ɣ�

��Disallow: /

��Ҫ��ĳһĿ¼�Լ��е��ݣ��Ŀ¼��б�ߣ�

��Disallow: /��Ŀ¼��/

��Ҫ��ĳ��ҳ��ָ��ҳ��

��Disallow: /��ҳ��html

��Disallow��ʹ��ǰ׺��ͨ��

��Ҫ��Ŀ¼a1-a100��ʹ��ķ�ʽд100�У��Disallow��/a

��Ҫע�⣬��κ��a��ͷ��Ŀ¼��ļ�Ҳ��Σ��á��Ҫ��a1-a100��ǲ��a50��ô�죿ͬѧ�ǿ��˼��һ�£��һ�ڡ�

��Ҫ��ֹ�ض��͵��ļ�� .gif��ʹ��ݣ�

��Disallow: /*.gif$

��*ƥ��ַ��$ƥ��url��Ͳ��˰ɣ��˽��ͬѧȥ��ѧһ��ͨ��

��ʾһ�£�Disallow��ִ�Сд��磬Disallow:/junkfile.asp �� junkfile.asp��ȴ��Junk_file.asp��

��ͨ��涼֧�֣�ʹ��ҪС�ġ�û�취��˭��robots.txtû��һ��Ҷ��ϵı�׼�ء�

ʵ�� ###

��ٶ��ҳ��

��User-agent: Baiduspider

��Disallow: /baidu

��Disallow: /s?

��User-agent: Googlebot

��Disallow: /baidu

��Disallow: /s?

��Disallow: /shifen/

��Disallow: /homepage/

��Disallow: /cpro

��User-agent: MSNBot

��Disallow: /baidu

��Disallow: /s?

��Disallow: /shifen/

��Disallow: /homepage/

��Disallow: /cpro

��…

��ڶ��Ӧ�ú��ѹ��˰ɣ�˳��˵һ��ٶȵ�robots.txt�Ƚφ��£��Ȥ��ͬѧ��Լ�һ�¡�

��߽��淨

��߼��淨��涼֧�֣�һ��˵��Ϊ��漼��쵼�ߵĹȸ�֧�ֵ��á�

��ӣ�google robots.txt

��allow

��ǵ��Ҫ��a1-a100��ǲ��a50��ô�죿

��1��

��Disallow��/a1/

��Disallow��/a2/

��…

��Disallow��/a49/

��Disallow��/a51/

��…

��Disallow��/a100/

��2��

��Disallow��/a

��Allow:/a50/

��ok��allow��һ��˰ɡ�

��˳��˵һ�䣬��a50��ļ�private.html��զ��

��Disallow��/a

��Allow:/a50/

��Disallow��/a50/private.html

��һ��ܷ��еĹ��ɣ��԰ɣ�˭�ܵ�Խϸ��˭�ġ�

��sitemap

��ǰ ��˵��ͨ��ҳ�ڲ��ӷ��µ��ҳ��û��ָ��ҳ��ô�죿��û��ɵĶ�̬��ҳ��ô�죿�ܷ��վ��Ա֪ͨ�� վ��Щ�ɹ�ץȡ��ҳ��sitemap��򵥵� Sitepmap ��ʽ�� XML �ļ��г��վ�е��ַ�Լ��ÿ��ַ��ݣ��ϴθ��µ�ʱ�䡢��ĵ�Ƶ��Լ��վ��ַ��Ҫ�̶ȵȵȣ��Щ��Ϣ�� Ը��ܵ�ץȡ��վ��ݡ�

��sitemap��һ��⣬�㹻��һƪ�µ��ĵģ��Ͳ�չ��ˣ��Ȥ��ͬѧ��Բο�sitemap�µ��ˣ��ô֪��վ��û��ṩsitemap�ļ��˵��վ��Ա��sitemap��Ƕ��ļ��ô֪��أ�

��robots.txt��λ��ǹ̶��ģ��Ǵ�Ҿ��뵽�˰�sitemap��λ��Ϣ��robots.txt���ͳ�Ϊrobots.txt��³�Ա�ˡ�

��ѡһ��google robots.txt��

��Sitemap: http://www.gstatic.com/culturalinstitute/sitemaps/www_google_com_culturalinstitute/sitemap-index.xmlSitemap: http://www.google.com/hostednews/sitemap_index.xml��һ�䣬��ǵ�һ��վ��ҳ�ڶ࣬sitemap�˹�ά��̫��ף�google�ṩ�˹��߿��Զ��sitemap��

��metatag

��ʵ�ϸ��˵�ⲿ��ݲ��robots.txt��Ҳ��ǳ��أ��Ҳ��֪��ʣ��ҷŵ��ɡ�

��robots.txt �ĳ��Ϊ��վ��Ա��Գ��վ��ݡ��ǣ��ʹʹ�� robots.txt �ļ��޷�ץȡ��Щ��ݣ��Ҳ��ͨ��ʽ�ҵ��Щ��ҳ��ӵ��С��磬��վ�Կ��ӵ��վ��ˣ��ҳ��ַ��Ϣ��ָ��վ��еĶ�λ��ֻ򿪷�ʽĿ¼��ϵͳ�еı��⣩�п��ܻ��С��볹�׶��զ��أ��ǣ� Ԫ��ǣ��meta tag��

��Ҫ��ȫ��ֹһ��ҳ��У��ʹ��վ��ӵ��ҳ��ʹ�� noindex Ԫ��ǡ�ֻҪ��鿴��ҳ��ῴ�� noindex Ԫ��ǲ��ֹ��ҳ��ʾ��У��ע��noindexԪ��ṩ��һ��ҳ��ƶ��վ�ķ��ʵķ�ʽ��

��

��Ҫ��ֹ��潫��վ�е��ҳ��ҳ�Ĳ��ӣ�

��

��nameȡֵ��Ϊĳ��User-agent�Ӷ�ָ��ĳһ��档

��noindex�⣬��Ԫ��ǣ��˵nofollow��ֹ��Ӵ�ҳ��и��ӡ��ϸ��Ϣ��Բο�Google֧�ֵ�Ԫ��ǣ��һ�䣺noindex��nofollow��HTML 4.01�淶��tag��ڲ�ͬ��֧�ֵ�ʲô�̶ȸ��ͬ��в��ĸ��˵��ĵ��

��Crawl-delay

��˿��Щ��ץ��Щ��ץ֮�⣬robots.txt��ץȡ��ʡ��أ�ͨ��ץȡ֮��ȴ��

��Crawl-delay:5

��ʾ��ץȡ��һ��ץȡǰ��Ҫ�ȴ�5�롣

��ע�⣺google�Ѿ��֧��ַ�ʽ�ˣ��webmastertools��ṩ��һ��ܿ��Ը�ֱ�۵Ŀ��ץȡ��ʡ�

�� һ��⻰��ǰ�Ҽǵ��һ��ʱ��robots.txt��֧�ָ��ӵĲ��Visit-time��ֻ��visit-timeָ��ʱ�� ��ſ��Է��ʣ�Request-rate: ��URL�Ķ�ȡƵ�ʣ��ڿ��Ʋ�ͬ��ʱ��β��ò�ͬ��ץȡ��ʡ��֧�ֵ��̫�٣��ͽ��ķϵ��ˣ��Ȥ��ͬѧ��google��˽� ��Ŀǰgoogle��baidu��Ѿ��֧��ˣ��С��湫˾ò�ƴ��û��֧�ֹ��ȷ��֧��ҹ�ª��ˣ��ӭ��Ը�֪��

��ã�

��ðɣ��Ϊֹrobots.txt��صĶ��ܵ�Ҳ��߰˰��ˣ��ܼ�ֿ��ͬѧ��ƶ�ԾԾ��ˣ��ϧ��Ҫ��ˮ��ȫָ��robots.txt��վ��𣿲�һ��ٶȺ�360�Ͳ��ô��˾�ˡ�

��Э��һ��

��һ��robots.txtû��һ��ʽ�ı�׼��涼�ڲ��ϵ��robots.txt��ܣ��͵��ÿ��robots.txt��֧�̶ֳȸ��в�ͬ��˵��ĳ��ϵľ��ʵ�ֵĲ�ͬ�ˡ�

��

�� robots.txt��Ҳ��Ҫץȡ�ģ��Ч�ʿ��ǣ�һ��治��ÿ��ץȡ��վ��ҳǰ��ץһ��robots.txt�� robots.txt��²�Ƶ��Ҫ��ͨ��ץȡһ�Σ��󻺴��൱��ʱ�䡣��վ��Ա�� robots.txt��޸��ĳЩ��򣬵��Ƕ��˵��Ч��ֻ�е��´�ץȡrobots.txt֮��ܿ��µ��ݡ��ε��ǣ��´�ץȡrobots.txt��ʱ�䲢��վ��Ա��Ƶġ��Ȼ��Щ��ṩ��web ��߿��վ��Ա֪ͨ��Ǹ�url��˱仯��ץȡ��ע�⣬�˴��ǽ��飬��ʹ��֪ͨ��棬��ʱץȡ��Ȼ�ǲ�ȷ��ģ�ֻ�Ǳ��ȫ��֪ͨҪ�õ㡣��ںö��٣��ǾͿ��ĺͼ��ˡ�

��ignore

�� ⣬��֪��⻹��⣬��Щ��治̫��ػ��ȫ��robots.txt��ų��Ա��⣬��˵��֪�� robots.txt��⣬��robots.txt��һ��ǿ�ƴ�ʩ��վ��Ҫ��ܣ��ȡ��ʩ��˵��û��֤��ܣ�ip�� أ��Ƶ�ʿ��Ƶȡ�

��͵͵��ץ

��ĸ��⣬��ʹ��ƣ��Ȼ��ĳЩ��ץȡ��Ϊ��ͻ��Щ��ƣ��һЩ��⼦��е�ץȡ��۵�˵��ֻҪ��ͨ�û��Է��ʣ��Ͳ��ȫ�ž��ֶ��ץȡ��Ϊ��ǣ��ͨ��ֶ�ʹץȡ�Ĵ��öԷ��޷��ܡ��˵��Captcha�� Ajax�û��Ϊ��첽��صȵȡ��Ͳ��ڱ��۵ķ��ˡ���վ�Ƽ��Ķ�>>> �ȸ裺URL��Ҫ��2000�ַ��,

��й��

�� robots.txt��й�ܵķ��ա��ĳһ��վ��robots.txt��ͻȻ��һ��Disallow /map/��뵽��ʲô��ǲ��Ҫ�Ƴ��ͼ��ˣ��к��ĵ�ͬѧ�ͻῪʼ��Ը��ļ��ȥ��ʸ�·��µ��ļ��ϣ��ܿ��ϲ��ò�Ƶ�� google�ĵ�ͼ��ô��ǰ��ģ��Ҳ�̫ȷ��Ҿ͵��ˡ�

��

��google webmaster tools

��robots.txt��ɹ��

��Perl robots.txt��

��Python robots.txt��

��£�

��robots.txt��ֹ�ٶ�֩��Baidusppiderץȡ

User-agent: * Disallow: / Allow: /complain/ Allow: /media_partners/ Allow: /about/ Allow: /user_agreement/ User-agent: ByteSpider Allow: / User-agent: ToutiaoSpider Allow: / ��ǽ��ͷ��robots.txt��ֹ��¼��ͬʱ��...
��ٷ�˵��robots�ļ��Ƿ�֧��Ŀ¼

��ͬѧ�ʰٶ�վ��ѧԺһ��robots��⣺��и�վ��Ŀ¼�ṹʹ�õ��ģ��ʽ��ģ�www.a.com/ð�յ�/123.html��robots�ļ��sitemap�ļ��ʱ�򣬿��ֱ��ʹ��𣿰ٶ��...
��վ��Robots��δ��

robots�ļ��̬�к��Ҫ��һ��ڣ�ͬʱҲ��һ��ϸ�ڵĻ��ڡ��ܶ�վ��ͬѧ��վ��Ӫ��У��׺��robots�ļ��Ĵ��ڣ��д��󸲸ǻ��ȫ��robots��ɲ��Ҫ��ʧ�� ...
��ֹ�ٶ�ͼƬ��¼ĳЩͼƬ�ķ��

Ŀǰ�ٶ�ͼƬ��Ҳʹ��ٶ��ҳ��ͬ��spider��ֹBaiduspiderץȡ��վ��ͼƬ��ֹ��Baiduspiderץȡ��վ�ϵ�ĳ��ض��ʽ��ͼƬ�ļ��ͨ��robotsʵ�֣��...
ʹ��robots��֩��ڶ�

��ڰٶ��˵��֩��ڶ��ָ��վͨ��͵ĳɱ��ࡢ��ͬ��url��ͬ�Ķ�̬URL ��һ��ѭ��ĺڶ��spider��ס��spider�˷��˴��Դץȡ��ȴ��Ч��...
robots.txtд��_��ôдrobots

robots��վ��spider��ͨ��Ҫ��վ��ͨ��robots�ļ��վ�в��뱻��¼�Ĳ��ֻ��ָ��ֻ��¼�ض��Ĳ��֡��ע�⣬��վ��ϣ��¼��...
�ٶ�վ��ƽ̨robots��

�װ��վ��Ա�� ܸ��˵ĸ��ߴ�ң��ٶ�վ��ƽ̨ robots ��ȫ��ܹ�ʵʱ�鿴��վ�ڰٶ��Ч�ĵ�robots�ļ��֧�ֶ�robots��﷨��߼�У�飬��վ��...
��дrobots�ļ�_robots�ļ�д��_robot.txt��

��һ 1.robots.txt�ļ��ʲô robots.txt��һ��ı��ļ��з��վ��ʱ��Ҫ�鿴�ĵ�һ��ļ��robots.txt�ļ��֩��ڷ��ʲô�ļ��ǿ��Ա��鿴�ġ�ÿ��վ��ý��...
robots.txt�ļ��ĸ�ʽ

robots.txt�ļ��һ��ļ�¼��Щ��¼ͨ��зֿ��CR,CR/NL, or NL��Ϊ��ÿһ��¼�ĸ�ʽ��ʾ�� field:optional spacevalueoptionalspace �ڸ��ļ��п��ʹ��#��ע�⣬��ʹ...
robots.txt�ļ��ʲô��

��ͨ��һ�ֳ��robot��ֳ�spider��Զ��ʻ��ϵ��ҳ��ȡ��ҳ�� Ϣ��վ�д��һ��ı��ļ�robots.txt��ļ��վ�в��뱻robot ��ʵĲ��֣��...

��һƪ��ȸ裺URL��Ҫ��2000�ַ��
��һƪ��ʲô��Sitemap�� Sitemap��վ��ͼ

������תrobotsЭ�飬���ֱر�

��תrobotsЭ�飬��ֱر�