ShopFlixBot - Crawler Documentation & Seller Integration Guide

Overview

Shopflix uses a web crawler (ShopFlixBot) to download and process XML product feeds from seller websites. This document covers the crawler's behavior, network identity, and the steps sellers (or their technical teams) need to take to ensure uninterrupted access.

User-Agent

ShopFlixBot identifies itself using the HTTP header:

User-Agent: ShopFlixBot

In some cases, it may also use a generic browser-style header:

User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36

Crawled URLs

ShopFlixBot only accesses the following resources:

XML product feed - the feed URL as registered in the Merchant Portal.
Image URLs - image links referenced within the XML, required for product creation.
Product page URLs - product links referenced within the XML, used for occasional validation.

IP Addresses

All requests originate from the following IPs:

Protocol	Address
IPv4	`18.193.166.252`
IPv4	`3.125.27.163`

Request Behavior

ShopFlixBot maintains a reasonable request rate at all times.
ShopFlixBot does not perform any form of authentication (no cookies, tokens, or login flows).
ShopFlixBot requires access to seller XML URLs without rate limits.

Seller-Side Configuration

To ensure uninterrupted feed access, sellers must whitelist ShopFlixBot's IPs and User-Agent in their server and security infrastructure. The instructions below cover the most common scenarios.

General (Any Firewall / WAF / Rate Limiter)

Whitelist the IPs listed above in your firewall and any rate-limiting tools.
Allow the User-Agent ShopFlixBot in any ACL or bot-filtering mechanism.
Exclude the IPs from rate limiting - ShopFlixBot needs unrestricted access to the XML feed endpoint.

Cloudflare-Specific Setup

Cloudflare security features (Bot Fight Mode, Managed Challenge, custom WAF rules) frequently block or challenge ShopFlixBot requests. Both steps below are required.

Step 1 - Add IPs to IP Access Rules

Navigate to Cloudflare Dashboard → Security → WAF → Tools → IP Access Rules and add each IP with the action Allow:

18.193.166.252
3.125.27.163

Why this is critical: Bot Fight Mode can ignore WAF skip rules entirely. Adding the IPs to IP Access Rules is the only reliable way to bypass it. Additionally, non-Greek traffic may be subject to Managed Challenge via custom rules unless the source IP is explicitly allowed.

Step 2 - Create a WAF Skip Rule for the User-Agent

Navigate to Security → WAF → Custom Rules and create a rule that skips all security checks for requests matching:

(http.user_agent contains "ShopFlixBot")

Note: If additional control mechanisms are in place (e.g., custom rules, managed challenges, third-party bot protection policies), requests may still be blocked even with this rule. Step 1 (IP Access Rules) takes precedence and should always be configured.

Testing & Troubleshooting

To simulate a ShopFlixBot request from the command line:

curl -v -A "ShopFlixBot" -L https://www.example.com/feed.xml

If the feed is inaccessible, verify that both the IP whitelist and User-Agent skip rule are in place, and check your server/CDN access logs for blocked or challenged requests.

Webhook Events