Parser
Parser is used in the GraphQL query to parse the html. Current system supports xpath in Lxml parser.
Library does not support Beautiful soup as it slower than lxml parser and Selector parsing is comparatively slower than xpath.
Lxml
- class scrapqd.gql_parser.lxml_parser.LXMLParser(raw_html=None, html_tree=None)
This is concerete implementation for lxml gql_parser to parse html text.
- xpath_element(element, xpath=None, **kwargs)
Extracts target node using xpath from given html element.
- Parameters
element – Html element.
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[HTMLElement]
- xpath_text(element, xpath, **kwargs)
Extracts text for given xpath.
- Parameters
element – Html element.
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[String]
- extract_element_source_text(element)
Extracts source html content
- Parameters
element – Html element.
- Returns
String
- extract_text(xpath, **kwargs)
Extracts text content from element.
- Parameters
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[String]
- extract_elements(xpath, **kwargs)
Extracts nodes from given html element.
- Parameters
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[HTMLElement]
- extract_attr(xpath, **kwargs)
Extracts attributes from the html element.
- Parameters
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[Dict]
- extract_form_input(xpath, **kwargs)
Extracts form inputs using given xpath. Method expects xpath to locate form node.
- Parameters
xpath – Xpath to locate the elements.
kwargs – Additional keyword arguments for extensibility.
- Returns
List[Dict]