Introduction to APIs: Why and when use them, and how to get started

Why APIs?

Previously, we manually examined a sample of just 20 texts and tried using an LLM for sentiment analysis. How long did this take you? Would it still be doable if there were 75 texts? Likely, yes. However, imagine you have 10,000 texts to analyze for sentiment. Analyzing 10,000 texts one by one (or copying them into a tool manually) would be nearly impossible – it would take endless hours and be prone to error. We need a way to automate and scale the process. This is where APIs come in. By leveraging an API, we can send those thousands of texts to a powerful external service that analyzes sentiment and returns results in seconds. A nice recent example of this in economics research is a recent working paper that used as data over 1,400 American life narratives from the 1930s to uncover common themes about what it means to live a meaningful life.

Our baseline approach is for Python, but R is pretty similar (see below)

First, the architecture of the web: client, server, request, response

Before talking about APIs, it helps to know the basic shape of how anything on the web talks to anything else. This is the foundation – APIs are just one specific use of it.

Whenever you load a webpage, your browser (the client) sends a short message – a request – to another computer (the server) asking for something. The server does whatever work is needed and sends back a response: a status code (e.g. 200 OK if it worked, 404 Not Found if the thing isn’t there) plus the actual content.

Two pieces describe a request:

a URL, which is the address of what you want (e.g. https://example.com/data), and
an HTTP method, which says what you want to do with it. The two you’ll see most are GET (fetch something) and POST (send something).

That round trip – client sends request, server sends response – is the request-response cycle, and it’s the same whether your browser is loading a news site or your Python script is asking an API for sentiment scores. Each request is stateless: the server doesn’t remember the previous one, so each call has to carry everything it needs.

Julia Evans has a nice short illustrated explainer: How URLs work, with a longer “HTTP” zine if you want to go deeper.

Once you have this picture in your head, an API stops being mysterious: it’s the same client, server, request, response – just with a machine on the client side instead of a person.

What is an API?

An API (Application Programming Interface) is like a messenger or middleman that lets two different programs talk to each other and exchange information. Instead of a person directly doing a task, you have one software program asking another program to do something on its behalf. A popular analogy is that an API is similar to a restaurant waiter:

You (the client) are sitting at a table, ready to order a meal (you have a request for information or a service).
The waiter (the API) takes your order and relays it to the kitchen. You don’t go into the kitchen yourself – the waiter is the go-between.
The kitchen (the server) is where the work happens. The chef prepares the meal (the data or service you requested).
The waiter (API) returns with your meal and serves it to you. You get exactly what you ordered, without having to know how the kitchen prepared it.

In this analogy, the restaurant’s menu is like the API documentation – it lists what you can ask for and how to ask for it. If you request something not on the menu, the waiter (API) will tell you it’s not available (an error). Similarly, an API provides a set of rules and endpoints that define what requests can be made and what responses you can expect.

This means you don’t need to know the complex inner workings of the server or service. You just need to know what to ask for and how to ask for it through the API. The API handles the communication, just as the waiter handles communication between you and the kitchen.

Benefits of Using APIs in Data Analysis

Why use APIs as a data analyst? Here are some key benefits:

Scalability: APIs let you process large volumes of data quickly. You can automate requests in code, so analyzing 10,000 texts or more becomes feasible. Instead of manually working with each piece of data, you let a server handle the heavy lifting.
Access to Powerful Tools: Many companies provide APIs for advanced services like sentiment analysis, language translation, image recognition, or data storage. As a data analyst, you can tap into these pre-built models and services without having to develop them from scratch.
Time and Effort Savings: Using an API, you can perform complex tasks with just a simple request. This saves you the time of writing extensive code or doing repetitive work. For example, rather than writing your own sentiment analysis algorithm, you can send text to an API and get sentiment results immediately.
Integration of Data Sources: APIs allow different software and datasets to integrate. You can pull data from different sources (e.g. Twitter’s API for tweets, a weather API for climate data) directly into your analysis pipeline. This marries data from multiple sources seamlessly.
Consistency and Reliability: When you use a well-established API, you benefit from a service that’s been tested and optimized. The API will handle errors, edge cases, and updates, so you get consistent results. It’s like outsourcing a task to an expert – you trust the API to do its job correctly.

API Keys and Authentication

Most APIs require some form of authentication to ensure that only authorized users or applications can use them. The simplest form is an API key. An API key is like a secret password or ID that you include with your API calls:

You typically get an API key by creating an account or registering an application with the API provider. For example, to use the Twitter API or OpenAI API, you’d sign up and receive a key (or token).
The key itself is usually a long string of characters (letters, numbers, and symbols). It’s unique to you or your application.
You include this key with every request. Often it goes in a request header (for instance, you might set a header Authorization: Bearer YOUR_API_KEY), or sometimes as a URL parameter (e.g., ?api_key=YOUR_API_KEY in the query string). The API documentation will tell you exactly how to include the key.
The server checks the key. If the key is missing or wrong, the API will usually respond with an authentication error (like a 401 Unauthorized status). If the key is valid, the server will proceed to handle your request.
Security tip: Never share your API keys publicly or commit them to public repositories. They are meant to be kept secret. If someone obtains your key, they could use the API pretending to be you, which might violate usage limits or incur costs on your account.

Some services use more complex authentication (like OAuth tokens which have limited scope or expiration), but an API key is the fundamental concept to understand first. It’s your access credential for using the API.

You usually don’t write the HTTP yourself: API libraries

Most popular APIs ship a client library (sometimes called an SDK) for Python and other languages. A client library is a thin wrapper that hides the URL, the headers, the API key, and the JSON parsing behind ordinary function calls. You set things up once, then you call methods like you’d call any other Python function.

Concretely, here’s the same API call done two ways. First, the raw HTTP version – you build the URL, attach your key, send it, and parse JSON back:

import os, requests

resp = requests.post(
    "https://api.openai.com/v1/responses",
    headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"},
    json={"model": "gpt-4o-mini", "input": "Is this review positive or negative?"},
)
text = resp.json()["output"][0]["content"][0]["text"]

Now the client-library version. Same task, but the library handles the URL, the auth header, and the JSON for you:

from openai import OpenAI

client = OpenAI()  # picks up OPENAI_API_KEY from your environment
resp = client.responses.create(
    model="gpt-4o-mini",
    input="Is this review positive or negative?",
)
text = resp.output_text

The two versions hit the same endpoint and get the same answer. The library version is shorter, easier to read, and – more importantly – the library deals with edge cases (auth refreshes, retries, rate limits, response schema changes) so you don’t have to.

A few practical notes:

Major services have official Python libraries: openai, anthropic for LLMs; google-cloud-*, boto3 (AWS), azure-* for cloud platforms; tweepy (X/Twitter), praw (Reddit), yfinance, fredapi for data sources.
You install them with pip install <name>. They use your API key from an environment variable so you never hard-code it.
When no client library exists, fall back to requests and call the HTTP endpoint directly. The result is the same; you just write a few more lines.

For a worked LLM example end-to-end (loading a key, sending text, structured output, batching), see Calling LLM APIs from Python.

Get an API

Next, you can go and get an API key for an AI service. As a start a few dollars will be enough. Follow instructions

Walkthrough examples

Getting GDP data from World Bank and FRED
More advanced Football data (python, R)
Practical LLM API calls from Python

What’s different in R

The picture in R is essentially the same. For raw HTTP, httr2 (or the older httr) plus jsonlite play the role of requests in Python. For wrappers, many services have an R package – ellmer for LLMs, fredr for FRED, rtweet for X/Twitter, etc. The core idea doesn’t change: a library abstracts the request-response mechanics into native R functions, so you focus on the analysis rather than the plumbing.

Scaling Up with APIs: From 20 to 10,000 and Beyond

The introduction of APIs into your workflow transforms what you can accomplish:

Tasks that were infeasible by hand become trivial to automate. You could get results in minutes or hours rather than weeks.
You can harness powerful algorithms provided by industry leaders. For example, instead of developing your machine learning model, you can use Google’s vision API to tag images or OpenAI’s language API to summarize text. This means you can tackle complex problems without needing to be an expert in those specific subfields.
You can work with real-time and large-scale data. Want to analyze football statistics or financial market data? There are APIs to fetch those streams of information. With APIs, you are not limited to data you can collect manually; you can pull in data from all over the world programmatically.

APIs are a bridge to practically unlimited data and capabilities. They let your programs communicate with other services to get things done efficiently. As we continue this course, you’ll get hands-on experience using APIs – turning the concepts you learned here into actual data analysis tasks. Embrace this new tool in your skillset. Whenever you find yourself needing to scale up or access a specialized service, think: Is there an API for that? Chances are, the answer will be yes, and now you’ll know how to use it!

More advanced and supplementary information.

--- title: "Introduction to APIs: Why and when use them, and how to get started" --- ## Why APIs? Previously, we manually examined a sample of just 20 texts and tried using an LLM for sentiment analysis. How long did this take you? Would it still be doable if there were 75 texts? Likely, yes. However, imagine you have **10,000** texts to analyze for sentiment. Analyzing 10,000 texts one by one (or copying them into a tool manually) would be nearly impossible – it would take endless hours and be prone to error. We need a way to **automate and scale** the process. This is where **APIs** come in. By leveraging an API, we can send those thousands of texts to a powerful external service that analyzes sentiment and returns results in seconds. A nice recent example of this in economics research is [a recent working paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5103580) that used as data over 1,400 American life narratives from the 1930s to uncover common themes about what it means to live a meaningful life. Our baseline approach is for Python, but R is pretty similar (see below) ## First, the architecture of the web: client, server, request, response Before talking about APIs, it helps to know the basic shape of how anything on the web talks to anything else. This is the foundation – APIs are just one specific use of it. Whenever you load a webpage, your browser (the **client**) sends a short message – a **request** – to another computer (the **server**) asking for something. The server does whatever work is needed and sends back a **response**: a status code (e.g. **200 OK** if it worked, **404 Not Found** if the thing isn't there) plus the actual content. Two pieces describe a request: - a **URL**, which is the address of what you want (e.g. `https://example.com/data`), and - an **HTTP method**, which says what you want to do with it. The two you'll see most are **GET** (fetch something) and **POST** (send something). That round trip – client sends request, server sends response – is the **request-response cycle**, and it's the same whether your browser is loading a news site or your Python script is asking an API for sentiment scores. Each request is **stateless**: the server doesn't remember the previous one, so each call has to carry everything it needs. Julia Evans has a nice short illustrated explainer: [How URLs work](https://wizardzines.com/comics/how-urls-work/), with a longer ["HTTP" zine](https://wizardzines.com/zines/http/) if you want to go deeper. Once you have this picture in your head, an API stops being mysterious: it's the same client, server, request, response – just with a machine on the client side instead of a person. ## What is an API? An **API (Application Programming Interface)** is like a **messenger** or **middleman** that lets two different programs talk to each other and exchange information. Instead of a person directly doing a task, you have one software program asking another program to do something on its behalf. A popular analogy is that an API is similar to a [**restaurant waiter**](https://www.youtube.com/watch?v=OVvTv9Hy91Q): * **You (the client)** are sitting at a table, ready to order a meal (you have a request for information or a service). * **The waiter (the API)** takes your order and relays it to the kitchen. You don’t go into the kitchen yourself – the waiter is the go-between. * **The kitchen (the server)** is where the work happens. The chef prepares the meal (the data or service you requested). * **The waiter (API) returns with your meal** and serves it to you. You get exactly what you ordered, without having to know how the kitchen prepared it. In this analogy, the restaurant’s **menu** is like the **API documentation** – it lists what you can ask for and how to ask for it. If you request something not on the menu, the waiter (API) will tell you it’s not available (an error). Similarly, an API provides a set of **rules and endpoints** that define what requests can be made and what responses you can expect. This means you don’t need to know the complex inner workings of the server or service. You just need to know *what* to ask for and *how* to ask for it through the API. The API handles the communication, just as the waiter handles communication between you and the kitchen. ## Benefits of Using APIs in Data Analysis Why use APIs as a data analyst? Here are some key benefits: * **Scalability:** APIs let you process *large volumes* of data quickly. You can automate requests in code, so analyzing 10,000 texts or more becomes feasible. Instead of manually working with each piece of data, you let a server handle the heavy lifting. * **Access to Powerful Tools:** Many companies provide APIs for advanced services like sentiment analysis, language translation, image recognition, or data storage. As a data analyst, you can tap into these **pre-built models and services** without having to develop them from scratch. * **Time and Effort Savings:** Using an API, you can perform complex tasks with just a simple request. This saves you the time of writing extensive code or doing repetitive work. For example, rather than writing your own sentiment analysis algorithm, you can send text to an API and get sentiment results immediately. * **Integration of Data Sources:** APIs allow different software and datasets to integrate. You can pull data from different sources (e.g. Twitter’s API for tweets, a weather API for climate data) directly into your analysis pipeline. This **marries data from multiple sources** seamlessly. * **Consistency and Reliability:** When you use a well-established API, you benefit from a service that’s been tested and optimized. The API will handle errors, edge cases, and updates, so you get consistent results. It’s like outsourcing a task to an expert – you trust the API to do its job correctly. ## API Keys and Authentication Most APIs require some form of **authentication** to ensure that only authorized users or applications can use them. The simplest form is an **API key**. An API key is like a **secret password or ID** that you include with your API calls: * You typically get an API key by creating an account or registering an application with the API provider. For example, to use the Twitter API or OpenAI API, you’d sign up and receive a key (or token). * The key itself is usually a long string of characters (letters, numbers, and symbols). It’s unique to you or your application. * You include this key with every request. Often it goes in a request header (for instance, you might set a header `Authorization: Bearer YOUR_API_KEY`), or sometimes as a URL parameter (e.g., `?api_key=YOUR_API_KEY` in the query string). The API documentation will tell you exactly how to include the key. * The server checks the key. If the key is missing or wrong, the API will usually respond with an authentication error (like a 401 Unauthorized status). If the key is valid, the server will proceed to handle your request. * **Security tip:** Never share your API keys publicly or commit them to public repositories. They are meant to be kept secret. If someone obtains your key, they could use the API pretending to be you, which might violate usage limits or incur costs on your account. Some services use more complex authentication (like OAuth tokens which have limited scope or expiration), but an API key is the fundamental concept to understand first. It’s your **access credential** for using the API. ## You usually don't write the HTTP yourself: API libraries Most popular APIs ship a **client library** (sometimes called an SDK) for Python and other languages. A client library is a thin wrapper that hides the URL, the headers, the API key, and the JSON parsing behind ordinary function calls. You set things up once, then you call methods like you'd call any other Python function. Concretely, here's the same API call done two ways. First, the raw HTTP version – you build the URL, attach your key, send it, and parse JSON back: ```python import os, requests resp = requests.post( "https://api.openai.com/v1/responses", headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"}, json={"model": "gpt-4o-mini", "input": "Is this review positive or negative?"}, ) text = resp.json()["output"][0]["content"][0]["text"] ``` Now the client-library version. Same task, but the library handles the URL, the auth header, and the JSON for you: ```python from openai import OpenAI client = OpenAI() # picks up OPENAI_API_KEY from your environment resp = client.responses.create( model="gpt-4o-mini", input="Is this review positive or negative?", ) text = resp.output_text ``` The two versions hit the same endpoint and get the same answer. The library version is shorter, easier to read, and – more importantly – the library deals with edge cases (auth refreshes, retries, rate limits, response schema changes) so you don't have to. A few practical notes: - Major services have official Python libraries: **`openai`**, **`anthropic`** for LLMs; **`google-cloud-*`**, **`boto3`** (AWS), **`azure-*`** for cloud platforms; **`tweepy`** (X/Twitter), **`praw`** (Reddit), **`yfinance`**, **`fredapi`** for data sources. - You install them with `pip install <name>`. They use your API key from an environment variable so you never hard-code it. - When no client library exists, fall back to **`requests`** and call the HTTP endpoint directly. The result is the same; you just write a few more lines. For a worked LLM example end-to-end (loading a key, sending text, structured output, batching), see [Calling LLM APIs from Python](llm-api-python.html). ## Get an API Next, you can go and get an API key for an AI service. As a start a few dollars will be enough. Follow [instructions](get-ai-api-key.html) ## Walkthrough examples 1. Getting [GDP data](walkthrough-wb-fred.html) from World Bank and FRED 2. More advanced [Football data](walkthrough-fbref.html) (python, R) 3. Practical [LLM API calls from Python](llm-api-python.html) ## What's different in R The picture in R is essentially the same. For raw HTTP, **`httr2`** (or the older `httr`) plus **`jsonlite`** play the role of `requests` in Python. For wrappers, many services have an R package – **`ellmer`** for LLMs, **`fredr`** for FRED, **`rtweet`** for X/Twitter, etc. The core idea doesn't change: a library abstracts the request-response mechanics into native R functions, so you focus on the analysis rather than the plumbing. ## Scaling Up with APIs: From 20 to 10,000 and Beyond The introduction of APIs into your workflow transforms what you can accomplish: * Tasks that were **infeasible by hand** become trivial to automate. You could get results in minutes or hours rather than weeks. * You can harness **powerful algorithms** provided by industry leaders. For example, instead of developing your machine learning model, you can use Google’s vision API to tag images or OpenAI’s language API to summarize text. This means you can tackle complex problems without needing to be an expert in those specific subfields. * You can work with **real-time** and **large-scale data**. Want to analyze football statistics or financial market data? There are APIs to fetch those streams of information. With APIs, you are not limited to data you can collect manually; you can pull in data from all over the world programmatically. APIs are a bridge to practically unlimited data and capabilities. They let your programs communicate with other services to get things done efficiently. As we continue this course, you’ll get hands-on experience using APIs – turning the concepts you learned here into actual data analysis tasks. Embrace this new tool in your skillset. **Whenever you find yourself needing to scale up or access a specialized service, think: *Is there an API for that?*** Chances are, the answer will be yes, and now you’ll know how to use it! [More advanced and supplementary information.](api-advanced.html)