Getting featured content from Wikipedia with Python
Many Wikipedias include a daily featured article and other curated content on their homepages. You can see an example of this on English Wikipedia and Hebrew Wikipedia. The featured content endpoint lets you access this content programmatically, adding high-quality, multilingual content to your apps. In this tutorial, you'll use the API to extract information about today's featured article, featured image, and latest news.
The Wikimedia API supports featured content in over 12 languages based on the type of content. You can see language availability for each type of featured content by visiting the Feed API docs. While a Wikipedia may include featured content on its main page, not all Wikipedias are integrated into the Wikimedia API.
API authentication: These examples use a User Agent header to identify the source of the request and an OAuth 2.0 access token for authentication. To learn about authentication for the Wikimedia API, read the authentication guide.
Today's featured article
Editors from across Wikipedias select and curate quality articles to feature on their wiki's homepage. The featured content endpoint allows you to access the daily featured article in over 10 languages. To use this endpoint, you'll need today's date in YYYY/MM/DD format and the language code. Language codes are usually two or three letters, such as en
for English, he
for Hebrew, and fa
for Persian. To get the language code for your language, review the list of supported languages.
# Python 3
# Get today's date in YYYY/MM/DD format.
import datetime
today = datetime.datetime.now()
date = today.strftime('%Y/%m/%d')
# Choose your language, and get today's featured content.
import requests
language_code = 'en' # English
headers = {
'Authorization': 'Bearer YOUR_ACCESS_TOKEN',
'User-Agent': 'YOUR_APP_NAME (YOUR_EMAIL_OR_CONTACT_PAGE)'
}
base_url = 'https://api.wikimedia.org/feed/v1/wikipedia/'
url = base_url + language_code + '/featured/' + date
response = requests.get(url, headers=headers)
Once you've made request, you can extract information about today's featured article (tfa
) from the JSON response. In this example, we'll use the title of the article, the link to the original article on Wikipedia, a short extract, and a thumbnail of the article's lead image. To satisfy the attribution requirements for Wikipedia's CC BY-SA 4.0 license, make sure to include a link back to the original article on Wikipedia.
# Get the featured article's title, URL, extract, and thumbnail.
import json
response = json.loads(response.text)
display_title = response['tfa']['titles']['display']
desktop_url = response['tfa']['content_urls']['desktop']['page']
extract_html = response['tfa']['extract_html']
thumbnail_url = response['tfa']['thumbnail']['source']
Today's featured image
Wikimedia Commons is a collection of over 60,000,000 freely usable media files, many of which are used in Wikipedia articles. The featured content endpoint includes information about the daily featured image from Wikimedia Commons.
When reusing a free image, review the license to provide the correct attribution information. For most free content licenses, you can correctly attribute the work by listing the artist, license, and linking to the file page on Wikimedia Commons. The Wikimedia API currently supports featured image descriptions in English only.
# Get the featured image's thumbnail, description, license, and attribution information.
thumbnail_url = response['image']['thumbnail']['source']
description_html = response['image']['description']['html']
artist_name = response['image']['artist']['text']
attribution_url = response['image']['file_page']
license_name = response['image']['license']['type']
license_url = response['image']['license']['url']
In the news
Some Wikipedias feature the latest news stories on their homepages, which we can also access using the featured content endpoint. Note that in-the-news content is only available for the current date. For example, to get the news stories featured on the main page of English Wikipedia:
# Choose language
language_code = 'en' # English
url = base_url + language_code + '/featured/' + date
response = requests.get(url, headers=headers)
# Get a list of headlines
headlines = []
response = json.loads(response.text)
for story in response['news']:
headline = story['story']
# Replace relative URLs with absolute URLs
headline = headline.replace('"./', '"https://' + language_code + '.wikipedia.org/wiki/')
headlines.append(headline)
As we did before, we can change the base URL to get today's news from another Wikipedia. Note that the featured stories aren't consistent between languages. This is because each Wikipedia is an independently managed community, and the stories featured depend on available articles and cultural context. You may also notice slight formatting differences between wikis.
More featured content
The Feed API supports additional types of featured content, including the previous day's most read articles, curated set of events that occurred on the given date, fixed holidays celebrated on the given date, and more.