Searching for Wikipedia articles using Python

Searching for Wikipedia articles using Python

The Wikimedia API lets you build apps and scripts that access content from Wikipedia and other Wikimedia projects. In this tutorial, we'll use the search endpoints to find encyclopedia articles about the Solar System.

Run this sample code yourself by downloading it as a Jupyter Notebook. To get started with Jupyter Notebooks, visit PAWS.


Wikipedia is available in over 300 languages. You can request content in your language by specifying the language code in the URL. Language codes are usually two or three letters, such as en for English, he for Hebrew, and fa for Persian. To get the language code for your language, visit the list of Wikipedias. In this example, we'll request one search result from English Wikipedia, but you can change the language by setting the language code.

# Python 3
# Choose your language, and search for articles.
import requests

language_code = 'en'
search_query = 'solar system'
number_of_results = 1
headers = {
  # 'Authorization': 'Bearer YOUR_ACCESS_TOKEN',
  'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}

base_url = 'https://api.wikimedia.org/core/v1/wikipedia/'
endpoint = '/search/page'
url = base_url + language_code + endpoint
parameters = {'q': search_query, 'limit': number_of_results}
response = requests.get(url, headers=headers, params=parameters)

The search content endpoint returns an array of page objects, giving you information about each article in the search results. The title property provides the article title as it appears on the page, while the key property gives you the title in URL-friendly format. You can use the key to construct the URL for the article using the language code and article path (/wiki). description provides a short summary of the topic. If a description isn't available, description will be null.

To add images to the search results, you can use the thumbnail object's url property. Like description, thumbnail is null if no thumbnail is available. To account for this, we'll use a try statement that substitutes a default image of the Wikipedia globe in case of an exception.

# Get article title, description, and URL from the search results
import json

response = json.loads(response.text)

for page in response['pages']:
  display_title = page['title']
  article_url = 'https://' + language_code + '.wikipedia.org/wiki/' + page['key']
  try:
    article_description = page['description']
  except:
    article_description = 'a Wikipedia article'
  try:
    thumbnail_url = 'https:' + page['thumbnail']['url']
  except:
    thumbnail_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Wikipedia-logo-v2.svg/200px-Wikipedia-logo-v2.svg.png'

To get more information about how the article relates to the search query, you can use the excerpt property, which gives you a few lines from the article in HTML. In the excerpt, search terms are highlighted with span tags with class="searchmatch" to make them easy to style in your app.

Search languages

Once you have a page title, you can search for pages with the same topic in different languages using the get languages endpoint.

# Set the article title and get related articles in other languages.
title = 'Solar System'
endpoint = '/page/' + title + '/links/language'

url = base_url + language_code + endpoint
response = requests.get(url, headers=headers)
response = json.loads(response.text)

language_links = []

for language in response:
    link = '{name}: https://{code}.wikipedia.org/wiki/{title}'.format(
        name=language['name'], code=language['code'], title=language['key'])
    language_links.append(link)

You should now be able to use the Wikimedia API to search for article on Wikipedia. You can also use these endpoints with any Wikimedia projects; try searching for dictionary entries, famous quotes, and more. For more information about these endpoints, see the API reference.