`magpie.fetchers.generic`¶

Module Contents¶

`Info`
`Content`	Content for typical webpages or blogs that are not specialized We reproduce the main fields extracted by https://github.com/adbar/trafilatura
`Fetcher`

class magpie.fetchers.generic.Info[source]¶: Bases: magpie.datamodel.Base

class magpie.fetchers.generic.Content[source]¶

Content for typical webpages or blogs that are not specialized We reproduce the main fields extracted by https://github.com/adbar/trafilatura

class magpie.fetchers.generic.Fetcher[source]¶

clean_page(content: str) → magpie.fetchers.generic.Content[source]¶: Generic webpage data extraction using trafilatura default extraction settings

fetch_additional_info(url: magpie.datamodel.Url) → magpie.fetchers.generic.Content[source]¶