Wikipedia Agentic Workflow

Extract Content — Wikipedia Agentic Workflow

Extract article title, first paragraph, and main content from the current Wikipedia page

Available free v1.0.0 Browser LLM

$ sidebutton install wikipedia

A focused extractor for Wikipedia article pages. It captures the article title, the lead paragraph, and the main body content — skipping the sidebar, references, navigation boxes, and other chrome. The output is clean plain text suitable for downstream summarisation, knowledge extraction, or quoting.

Assumes the browser is already on an article page; use it in chain with the open-article workflow when the starting point is a topic name. Language is preserved — if Wikipedia redirected to a localised edition, the extracted content matches that edition rather than silently switching to English.

Steps

1.

Extract text from a selector

selector

#firstHeading

as

title

browser.extract
2.

Extract text from a selector

selector

#mw-content-text .mw-parser-output > p:not([class])

as

first_paragraph

browser.extract
3.

browser extractAll

selector

#mw-content-text .mw-parser-output > p:not([class])

as

content

separator

\n\n

browser.extractAll

Workflow definition

schema_version: 1
version: "1.0.0"
last_verified: "2025-12-21"
id: wikipedia_extract_content
title: "Extract Content"
description: "Extract article title, first paragraph, and main content from the current Wikipedia page"
overview: |
  A focused extractor for Wikipedia article pages. It captures the article title, the lead paragraph, and the main body content — skipping the sidebar, references, navigation boxes, and other chrome. The output is clean plain text suitable for downstream summarisation, knowledge extraction, or quoting.

  Assumes the browser is already on an article page; use it in chain with the open-article workflow when the starting point is a topic name. Language is preserved — if Wikipedia redirected to a localised edition, the extracted content matches that edition rather than silently switching to English.

category:
  level: task
  domain: research
  reusable: true
policies:
  allowed_domains:
    - wikipedia.org
    - "*.wikipedia.org"
steps:
  # Extract article title
  - type: browser.extract
    selector: "#firstHeading"
    as: title

  # Extract first paragraph (the lead/summary)
  - type: browser.extract
    selector: "#mw-content-text .mw-parser-output > p:not([class])"
    as: first_paragraph

  # Extract main content (multiple paragraphs)
  - type: browser.extractAll
    selector: "#mw-content-text .mw-parser-output > p:not([class])"
    as: content
    separator: "\n\n"

← All Wikipedia workflows Wikipedia overview