Searching for Wikipedia articles using Python
The Wikimedia API lets you build apps and scripts that access content from Wikipedia and other Wikimedia projects. In this tutorial, we'll use the search endpoints to find encyclopedia articles about the Solar System.
Wikipedia is available in over 300 languages. You can request content in your language by specifying the language code in the URL. Language codes are usually two or three letters, such as en for English, he for Hebrew, and fa for Persian. To get the language code for your language, visit the list of Wikipedias. In this example, we'll request one search result from English Wikipedia, but you can change the language by setting the language code.
# Python 3
# Choose your language, and search for articles.
import requests
language_code = 'en'
search_query = 'solar system'
number_of_results = 1
headers = {
# 'Authorization': 'Bearer YOUR_ACCESS_TOKEN',
'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}
base_url = 'https://api.wikimedia.org/core/v1/wikipedia/'
endpoint = '/search/page'
url = base_url + language_code + endpoint
parameters = {'q': search_query, 'limit': number_of_results}
response = requests.get(url, headers=headers, params=parameters)
The search content endpoint returns an array of page objects, giving you information about each article in the search results. The title
property provides the article title as it appears on the page, while the key
property gives you the title in URL-friendly format. You can use the key
to construct the URL for the article using the language code and article path (/wiki). description
provides a short summary of the topic. If a description isn't available, description
will be null.
To add images to the search results, you can use the thumbnail
object's url
property. Like description
, thumbnail
is null if no thumbnail is available. To account for this, we'll use a try
statement that substitutes a default image of the Wikipedia globe in case of an exception.
# Get article title, description, and URL from the search results
import json
response = json.loads(response.text)
for page in response['pages']:
display_title = page['title']
article_url = 'https://' + language_code + '.wikipedia.org/wiki/' + page['key']
try:
article_description = page['description']
except:
article_description = 'a Wikipedia article'
try:
thumbnail_url = 'https:' + page['thumbnail']['url']
except:
thumbnail_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Wikipedia-logo-v2.svg/200px-Wikipedia-logo-v2.svg.png'
To get more information about how the article relates to the search query, you can use the excerpt
property, which gives you a few lines from the article in HTML. In the excerpt, search terms are highlighted with span tags with class="searchmatch"
to make them easy to style in your app.
Search languages
Once you have a page title, you can search for pages with the same topic in different languages using the get languages endpoint.
# Set the article title and get related articles in other languages.
title = 'Solar System'
endpoint = '/page/' + title + '/links/language'
url = base_url + language_code + endpoint
response = requests.get(url, headers=headers)
response = json.loads(response.text)
language_links = []
for language in response:
link = '{name}: https://{code}.wikipedia.org/wiki/{title}'.format(
name=language['name'], code=language['code'], title=language['key'])
language_links.append(link)
You should now be able to use the Wikimedia API to search for article on Wikipedia. You can also use these endpoints with any Wikimedia projects; try searching for dictionary entries, famous quotes, and more. For more information about these endpoints, see the Core REST API.