Which AI model shall we chose?

Published

January 12, 2026

In what follows, here is my personal take as of date:2026-01-12.

Basics

Generative AI based on Large Language Models (genAI) is great for many tasks. In this course we only focus on aspects of Data Analysis:

  • designing analysis
  • writing code
  • data wrangling such as joining tables, sample design, variable transformations
  • exploratory data analysis
  • modelling, regressions, machine learning
  • causal inference
  • creating tables and graphs
  • writing reports and presentations

Basics

  • Most models have a free and paid tiers. Free ones are good as kind of Google search replacement. For serious work you’ll be better off with paid tiers.
  • As of now the leading models are Google’s Gemini 3.0, OpenAI’s GPT-5.2, and Anthropic’s Claude Opus 4.5.
  • Each model have faster (cheaper) and thinking (more expensive) variants. For data analysis work, the thinking variants are recommended.
  • All major models have deep research
  • All major models support tool use (e.g. web browsing, code execution) and agentic patterns (multi-step workflows with memory and tool use).
  • Open weights models are also quite good, Deepseek and Q. You can even run a 7bn parameter model on an expensive computer

Flagship models: 2026 January 12

landscape overview

Feature GPT-5.2 o3 (reasoning) Gemini 3 Deep Think Claude Opus 4.5
Design focus Flagship multimodal with native reasoning Deep reasoning for complex tasks 1M token context with extended thinking Coding & complex instruction following
Multimodality Text + images + audio + video (I/O) Text + tool calls Text + images + video Text + images + Computer Use 2.0
Browsing / tools Integrated search + MCP support Auto-search for up-to-date facts Grounded in Google Search Deep integration with MCP & OS
Default style Balanced, adaptive Concise, source-cited Thorough, exploratory Professional, highly structured
Context window 400k tokens 128k tokens 1M tokens 500k tokens
Strengths All-rounder, agentic workflows Step-wise analysis, audit trail Long-context research Coding agents, OS automation
Weak spots Expensive for simple tasks No native media I/O Slower response times Slower than Haiku/Sonnet

Models for data analysis

For data-analysis projects, the recommended approach is:

  • Reasoning models (o3, Deep Think) for complex analysis requiring accuracy
  • GPT-5.2 for agentic workflows and multimodal tasks
  • Gemini 3 for long-context document analysis (entire datasets, reports)
  • Claude Opus 4.5 for coding and complex technical writing

All major models now support tool use (MCP) and agentic patterns.

Research & Writing

  • Synthesis Over Summarization: AI tools increasingly synthesize multi-source inputs into structured insights rather than paraphrasing single documents.
  • Security & Privacy: Modern workspaces rely on isolated execution contexts; strong non-training guarantees apply primarily to paid and enterprise tiers.
  • Multimodal Capability: AI can interpret charts, screenshots, and handwritten notes and incorporate them into drafts.

For data analysis workspaces – comparison

Workspace 2026 Key Features
Anthropic Claude Artifacts • Creates interactive applications (tutors, calculators) within the output window.
• Real-time iteration on complex document structures.
OpenAI ChatGPT Canvas • Advanced frontier models with contextual persistence for tone and style.
• Inline editing with granular control over specific sections.
Google NotebookLM Interactive Audio Overviews with user interruption and questioning.
• Grounded citations linked directly to uploaded source segments.
Perplexity Pages • Multi-source synthesis using live web retrieval.
• Inline citation and consistency checking against sources.

Data Analysis details

  • Sandboxed Execution: Code runs in secure, ephemeral environments with no local system access.
  • Statistical Rigor: Strong support for Python-based libraries (e.g. pandas, scikit-learn) for exploratory and predictive analysis.
  • Direct Integration: AI can manipulate data directly within spreadsheets or dedicated analysis windows.
  • Limits: Reproducibility, package versions, and state persistence remain constrained relative to local workflows.

Data Analysis details

Workspace 2026 Key Features for Analysis
ChatGPT Data Analysis • Executes Python in managed compute environments for multi-file datasets.
• Assisted data cleaning and predictive modeling workflows.
Claude Analysis • High-fidelity SVG and lightweight interactive output.
• Fast iteration on statistical tables with publication-ready formatting.
Google Gemini in Sheets Multimodal cleaning: converts screenshots or PDFs into structured tables.
• Natural-language formula generation and transformations.
Microsoft Copilot in Excel Native Python-in-Excel for statistical scripts inside spreadsheets.
• Automated pivots, summaries, and forecasting via prompts.

Coding Assistance

  • Agent-Assisted Workflows: AI can coordinate multi-step tasks such as refactoring or bug fixing across large codebases, with human review.
  • Environment Security: Code is tested in secure sandboxes before changes are proposed.
  • Interconnected Tools: Integrated with development and collaboration platforms (e.g. Jira, Slack).

Coding Assistance — comparison

Workspace 2026 Key Features for Developers
Anthropic Claude Code • Native VS Code extension with agent-style workflows and inline diffs.
• Supports complex multi-file edits and testing assistance.
GitHub Copilot • Uses Extensions to interact with external dev tools (Azure, Slack, Jira).
• Deep context from local and remote repositories.
Cursor • AI-first editor with awareness of project-wide dependencies.
• Strong support for iterative refactors across files.
Windsurf (Codeium) Cascade mode for orchestrating large-scale refactors.
• Robust free tier for students and individual users.

Security note: SOC2 compliance is common; strict zero-retention guarantees typically apply to enterprise or explicitly configured accounts.

What changed from 2025 Q2

  • No need to think much re which models to use.
  • Leading models similar capability, but different. Not really sure how…

Feedback

Dear Reader. I have limited experience. Suggestions are welcome, please post an issue.