👻 Experimental library for scraping websites using OpenAI's GPT API.
openai
1.0pydantic
2.0['gpt-3.5-turbo', 'gpt-3.5-turbo-16k']
, the 16k model will be used automatically since the default 4k model will not be able to handle the request.PaginatedSchemaScraper
and add documentation for pagination.pydantic
schema support and
more useful error messages.HallucinationCheck
by default, it is overly aggressive and needs more work to be useful without raising false positives.HallucinationCheck
to handle more cases.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.['gpt-3.5-turbo', 'gpt-3.5-turbo-16k']
, the 16k model will be used automatically since the default 4k model will not be able to handle the request.PaginatedSchemaScraper
and add documentation for pagination.pydantic
schema support and
more useful error messages.HallucinationCheck
by default, it is overly aggressive and needs more work to be useful without raising false positives.HallucinationCheck
to handle more cases.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.PaginatedSchemaScraper
and add documentation for pagination.HallucinationCheck
by default, it is overly aggressive and needs more work to be useful without raising false positives.HallucinationCheck
to handle more cases.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.HallucinationCheck
by default, it is overly aggressive and needs more work to be useful without raising false positives.HallucinationCheck
to handle more cases.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.HallucinationCheck
to handle more cases.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.ScrapeResult
object to hold results of scraping along with metadata.pydantic
models as schemas and for validation.SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.tiktoken
for accurate token counts.cost_estimate
utility function.total_cost
attribute on SchemaScraper
objects).SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.xpath
and css
handling.