# ================================================================ # BLOCKED: Malicious bots, vulnerability scanners, attack tools # ================================================================ User-agent: CensysInspect User-agent: Gobuster User-agent: HTTrack User-agent: LexiBot User-agent: Nikto User-agent: Netsparker User-agent: Nmap User-agent: ScanAlert User-agent: WinHttpRequest User-agent: masscan Disallow: / # ================================================================ # BLOCKED: Aggressive scrapers and data harvesters # ================================================================ User-agent: 360Spider User-agent: Abonti User-agent: Acunetix User-agent: AhrefsBot User-agent: AhrefsSiteAudit User-agent: Alexibot User-agent: Applebot-Extended User-agent: AwarioBot User-agent: AwarioRssBot User-agent: AwarioSmartBot User-agent: BacklinkCrawler User-agent: BLEXBot User-agent: Bolt User-agent: BUbiNG User-agent: Bytespider User-agent: CatchBot User-agent: Cliqzbot User-agent: CrawlBot User-agent: DataForSeoBot User-agent: Diffbot User-agent: DotBot User-agent: EasouSpider User-agent: EtaoSpider User-agent: Exabot User-agent: ExtractorPro User-agent: FairShare User-agent: FriendlyCrawler User-agent: GarlikCrawler User-agent: Go-http-client User-agent: GrapeshotCrawler User-agent: ICC-Crawler User-agent: ImagesiftBot User-agent: img2dataset User-agent: InfoPathUserAgent User-agent: ISSCyberRiskCrawler User-agent: Iskanie User-agent: JamesBOT User-agent: Jyxobot User-agent: Kraken User-agent: LNSpidergram User-agent: LinkpadBot User-agent: MauiBot User-agent: MeanPathBot User-agent: MJ12bot User-agent: Neevabot User-agent: Nimbostratus-Bot User-agent: Nutch User-agent: OrangeBot User-agent: Panscient User-agent: PaperLiBot User-agent: PetalBot User-agent: Purebot User-agent: R6_CommentReader User-agent: R6_FeedFetcher User-agent: RankActiveLinkBot User-agent: Scrapy User-agent: SerpstatBot User-agent: SiteExplorer User-agent: SurveyBot User-agent: Spinn3r User-agent: TurnitinBot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended User-agent: WeSEE User-agent: XoviBot User-agent: YisouSpider User-agent: ZoominfoBot User-agent: ZumBot User-agent: meanpathbot User-agent: oBot User-agent: woriobot User-agent: yacybot Disallow: / # ================================================================ # BLOCKED: Email harvesters and contact scrapers # ================================================================ User-agent: Aboundex User-agent: AddThis User-agent: EmailCollector User-agent: EmailSiphon User-agent: EmailWolf User-agent: Harvest User-agent: LeechFTP User-agent: Metauri User-agent: Mippin User-agent: NetAnts User-agent: NICErsPRO User-agent: Teleport User-agent: Teleport Pro User-agent: WebCopier User-agent: WebLeacher User-agent: WebReaper User-agent: WebSauger User-agent: WebStripper User-agent: WebWhacker User-agent: WebZIP User-agent: Xaldon_WebSpider Disallow: / # ================================================================ # BLOCKED: AI training crawlers (we allow LLM search/chat bots) # ================================================================ User-agent: AI2Bot User-agent: Ai2Bot-Dolma User-agent: CCBot User-agent: Cohere-training-data-crawler User-agent: FacebookBot User-agent: FaceBot User-agent: FirecrawlAgent User-agent: Meta-ExternalAgent User-agent: Omgili User-agent: Omgilibot User-agent: Timpibot User-agent: YouBot Disallow: / # ================================================================ # ALLOWED: Tools we explicitly use and trust # ================================================================ User-agent: SemrushBot User-agent: SemrushBot-BA User-agent: SemrushBot-SI User-agent: SemrushBot-SWA User-agent: SemrushBot-CT User-agent: SemrushBot-BM User-agent: SplitSignalBot User-agent: SiteimproveBot User-agent: SiteimproveBot-Crawler Disallow: # ================================================================ # ALL OTHER BOTS: Standard path restrictions # ================================================================ User-agent: * Disallow: /star/t/parent/archives/ Disallow: /_dev Disallow: */dev Disallow: /academics/catalog/previous/ Disallow: /about/police/reports/ Disallow: /info/ Disallow: /mobile/ Disallow: */thankyou.php Disallow: *-thanks-* Disallow: *.inc Disallow: /about/construction/buildings/ Disallow: /start/witamy/ Disallow: /start/bienvenido/ Disallow: /about/visit/pdf/parking-lot-striping-july-2018-rev.pdf Allow: /about/visit/maps/campusmap.pdf Sitemap: https://www.harpercollege.edu/sitemap.xml