How to Guide

How to add custom executors to system

  • Understand Executor Interface

  • Create your custom executor similar to Requests or Selenium.

  • Add to the config - CRAWLERS. Example: Puppeteer

    from crawler.executors import Puppeteer, SeleniumOther
    
    CRAWLERS = {
        "PUPPETEER": Puppeteer,
        "SELENIUM_OTHER": SeleniumOther
    }
    
  • Override config

  • Restart your application

  • You should be able to use select_options as leaf query from the graphql ui.

How to add new leaves to system

  • Understand Query fields

  • Create your custom field similar to Example text field.

  • Add to the config - LEAVES. Example: select_options. It should be dict(name: field object).

    from crawlers.fields import select_options
    
    LEAVES = {
        'select_options': select_options
    }
    
  • Override config

  • Restart your application

  • You should be able to use select_options as leaf query from the graphql ui.

How to add additional data type

  • Understand Data Type

  • Create your data type conversion function.

    Function should accept one value to process and return one value after conversion. Example function to boolean data conversion.

    def boolean(value):
        if isinstance(value, int) or isinstance(value, float):
            value = False if value == 0 else True
        elif isinstance(value, bool):
            pass
        elif isinstance(value, str):
            if value.isdigit():
                value = False if float(value) == 0 else True
            else:
                try:
                    value = float(value)
                    value = False if value == 0 else True
                except:
                    value = False if value == 'false' else True
        elif value is not None:
            value = True
        else:
            value = False
        return value
    
  • Add to the config - DATATYPE_CONVERSION. Example: boolean. It should be dict(name: function).

    from crawlers.data_types import boolean
    
    LEAVES = {
        'boolean': boolean
    }
    
  • Override config

  • Restart your application

  • You should be able to use boolean as data type in the query.

How to add browsers to system

  • Understand Browser implementation.

  • Create your custom browser similar to GoogleChrome.

  • Add to the config - BROWSER. Example: chromium. It should be dict(name: field object).

    from crawlers.browsers import chromium
    
    LEAVES = {
        'CHROMIUM': chromium
    }
    
  • Override config

  • Restart your application

  • You should be able to use CHROMIUM in the browser with selenium query.