Ferret Versions Save

Declarative web scraping

v0.8.1

4 years ago

Fixed

Added existence check to CLICK and CLICK_ALL functions. #341
Added a check whether an element is in the viewport before scrolling. #342

v0.8.0

4 years ago

Added

Delay randomization for inputs. #283
Namespace support. #269
iframe support. #315
Better emulation of user interaction. #316, #331
ESCAPE_HTML, UNESCAPE_HTML and DECODE_URI_COMPONENT functions. #318
XPath support. #322
Regular expression operator. #326
INNER_HTML_SET and INNER_TEXT_SET functions. #329
Possibility to set viewport size. #334
FOCUS function. #340

Changed

RAND accepts optional upper and lower limits. #271
Updated CDP definitions. #328
Logic of iterator termination. #330

Fixed

Order of arguments in SCROLL function. #269
The command line parameter "--param" does not support colon. #282
Race condition during WAIT_NAVIGATION call. #281
Arithmetic operators. #298
Missed UA setting for HTTP driver. #318
Improper math operator used in calculating page load timeout. #319
Wrong function names in README. #321
JSON serialization for HTTPHeader type. #323

v0.7.0

5 years ago

Added

Autocomplete to CLI #219.
New mouse functions - MOUSE(x, y) and SCROLL(x, y) #237.
WAIT_NO_ELEMENT, WAIT_NO_CLASS and WAIT_NO_CLASS_ALL functions #249.
Computed HTMLElement.style property #255.
ATTR_GET, ATTR_SET, ATTR_REMOVE, STYLE_GET, STYLE_SET and STYLE_REMOVE functions #255.
WAIT_STYLE, WAIT_NO_STYLE, WAIT_STYLE_ALL and WAIT_NO_STYLE_ALL functions #256.
Cookies support. Now a document can be loaded with preset cookies. Also, HTMLDocument has .cookies property. In order to manipulate with cookies, COOKIE_DEL, COOKIE_SET AND COOKIE_GET functions were added #242.

LET doc = DOCUMENT(url, {
    driver: "cdp",
    cookies: [{
        name: "x-e2e",
        value: "test"
    }, {
        name: "x-e2e-2",
        value: "test2"
    }]
})

Changed

Renamed ParseTYPEP to MustParseTYPE #231.
Added context to all HTML object #235.

Fixed

Click events are not cancellable #222.
Name collision #223.
Invalid return in FQL Compiler constructor #227.
Incorrect string length computation #238.
Access to HTML object properties via dot notation #239.
Graceful process termination #240.
Browser launcher for macOS #246.

Breaking changes

New runtime type system #232.
Moved and renamed collections.IterableCollection and collections.CollectionIterator interfaces. Now they are in core package and called Iterable and Iterator 1af8b37.
Renamed collections.Collection interface to collections.Measurable 1af8b37.
Moved html interfaces from runtime/values package into drivers package #234.
Changed drivers initialization. Replaced old drivers.WithDynamic and drivers.WithStatic methods with a new drivers.WithContext method with optional parameter drivers.AsDefault() #234.
New document load params #234.

LET doc = DOCUMENT(url, {
    driver: "cdp"
})

v0.6.0

5 years ago

Added

Added support for context.Done() to interrupt an execution #201.
Added support for custom HTML drivers #209.
Added support for dot notation access and assignments for custom types #214
Added ELEMENT_EXISTS(doc, selector) -> Boolean function #210.

LET exists = ELEMENT_EXISTS(doc, ".nav")

Added PageLoadParams to DOCUMENT function #214.

LET doc = DOCUMENT("https://www.google.com/", {
    dynamic: true,
    timeout: 10000
})

Fixed

Math operators precedence #202.
Memory leak in DOWNLOAD function #213.

Breaking change

(Embedded) Removed builtin drivers initialization in Program #198. The initialization must be done via context manually.

v0.5.2

5 years ago

Fixed

Does not close a browser tab when it fails to load a page #193.
HTMLElement.value does not return an actual element value #195
Compiles a query with a duplicate variable in FOR statement #196.
Default CDP address #197.

v0.5.1

5 years ago

Fixed

Unable to change a page load timeout #186.
RETURN doc returns an empty string #187.
Unable to pass an HTML Node without a selector to INNER_TEXT and INNER_HTML #187.
doc.innerText returns an error #187.
Panics when WAIT_CLASS does not receive all required arguments #192.

v0.5.0

5 years ago

Added

FMT function #151.
DateTime functions #152, #153, #154, #156, #157, #165, #175, #182.
PAGINATION function #173.
SCROLL_TOP, SCROLL_BOTTOM and SCROLL_ELEMENT functions #174.
HOVER function #178.
Panic recovery mechanism #158.

Fixed

Unable to define variables and make function calls before FILTER, SORT and etc statements #148.
Unable to use params in LIMIT clause #173.
RIGHT should return substr counting from right rather than left #164.
INNER_HTML returns outer HTML instead for dynamic elements #170.
INNER_TEXT returns HTML instead from dynamic elements #170.

Breaking change:

Name collision between math and utils packages in standard library. Renamed LOG to PRINT #162.

v0.4.0

5 years ago

Added

COLLECT keyword #141
VALUES function #128
MERGE_RECURSIVE function #140

Fixed

Unable to use string literals as object properties commit

v0.3.0

5 years ago

Added

FROM_BASE64 function commit
Support for multi line strings commit
DOWNLOAD function commit
Binary expressions commit

Fixed

KEEP function does not perform deep cloning commit
WaitForNavigation callback can get called more than once commit
Concurrent map iteration and map write commit

Breaking changes

Renamed .innerHtml to .innerHTML commit

v0.2.0

5 years ago

Changelog

Added

Numeric functions commit
PDF function commit
ZIP function commit
MERGE function commit