Most people starting in data science think they need a breakthrough idea before they can build a portfolio-but that's not the case.
I've been doing this long enough to have built across a range of domains: medical image classification with CNNs, as well as natural language processing examples, though not my specialty I put together generative models for drug discovery, and heart rate analysis for psychology research (unpublished as of yet). None of those started with a grand vision. They started with a concrete question and a willingness to follow it.
This post is about how you build a portfolio in 2026, when GenAI exists, and what a good one actually looks like.
There's a version of the argument that says GenAI has made portfolios obsolete: anyone can generate a Jupyter notebook, Copilot writes the boilerplate, so why bother?
That argument misunderstands what a portfolio is actually for. A portfolio is evidence of judgment. It shows that you can frame a problem correctly, choose the right technique instead of the flashiest one, recognize when results are wrong or misleading, and communicate findings to someone who didn't build the thing.
GenAI accelerates the execution layer, but it doesn't replace the judgment layer. A portfolio full of notebook shells that run without errors but contain no real thinking will look exactly like what it is: scaffolding with nothing inside.
A Note on "Failure": A model that performs poorly is not a failed project. A poor result that comes with a clear diagnosis-understanding why the model struggled, what the data could not support, or where the assumptions broke down-is often more impressive than a suspiciously clean accuracy score with no explanation.
Employers and collaborators have seen plenty of 98% accuracy claims on imbalanced datasets. Someone who can say the model underperformed because of class imbalance and detail what they tried to fix it is a far more credible candidate. Someone who builds a thoughtful portfolio stands out more now, not less.
The most reliable source of project ideas is your own life. I ran a full analysis of my GoodReads reading history: NLP, sentiment analysis, reading pace trends, and predictive modeling on my own ratings. I had the data already. The question-"Do I rate books consistently or do I have hidden biases?"-came from genuine curiosity. That combination makes the work better and makes it far easier to explain in an interview.
Start with the data you already have. The question is usually already there too.
Some prompts to get you started:
A frequent mistake is chasing novelty too early. You don't need an original research paper; you need to demonstrate competence. Reimplementing a paper on a new dataset, applying a known technique to a domain where it hasn't been used much, or improving a public Kaggle solution with better feature engineering all show genuine skill.
Where novelty does matter is in your hook. "I classified pneumonia in chest X-rays" is fine. "I classified pneumonia in chest X-rays, then diagnosed why the model struggled at the decision boundary and found it was misled by rib artifacts" is better. The novelty isn't the method-it's the thinking.
Before you start stacking projects, understand what employers or collaborators are looking for. It boils down to three things: Breadth (working across problem types), Depth (going beyond a basic tutorial), and Communication (explaining your work to others).
A good portfolio isn't ten classification problems with different datasets; it demonstrates range across different dimensions.
Here's a rough mix to aim for over time. You don't need all of these immediately, but knowing which types you're missing helps prioritize what to build next. If you want a single repo that walks through most of these from scratch - preprocessing, classification, regression, unsupervised learning, CNNs, RNNs and more - my Introductory Data Science repo covers all of it with worked examples.
A mature portfolio also has range in where the problems come from:
Most beginner portfolios are entirely solo. That's fine to start, but collaborative work tells a different story. Working on someone else's project-contributing to open-source, submitting a pull request, or co-authoring an analysis-demonstrates that you can function in a team, read code you didn't write, and communicate technical decisions. Remember, contributing to open source doesn't have to mean heroic new features; documentation fixes and test coverage count.
A project without documentation is an exercise, not a portfolio piece. At a minimum, every project needs:
Documentation is the difference between a portfolio piece and a private experiment.
If you want a practical system for keeping your project knowledge organised as your portfolio grows, Your Professional Second Brain for Local LLM Work covers exactly that.
You're going to use GenAI tools, and you should. The question is how you talk about it. Be honest. Saying, "I used Copilot to scaffold the preprocessing pipeline, then spent three days debugging the alignment issues it missed" shows you used the tool but understood its output.
What you can't do is let GenAI substitute for understanding. The practitioners who will have the best careers in this environment use GenAI to go faster at execution while maintaining judgment at the decision layer. If you're unsure whether coding is even worth learning alongside these tools, I wrote about that in Why It Still Matters to Learn to Code in the Age of AI.
Before you publish or share, run through this:
If you can check all of these, the project is portfolio-ready. If not, the gap is usually fixable in a few hours.
One of the most common portfolio mistakes isn't failing to start-it's starting, then abandoning. A repo that hasn't been touched in three years, with broken dependencies and a README that refers to future work that never happened, signals the opposite of what you want. It tells someone looking at your profile that you built something once and moved on.
An abandoned repo.
Keeping repos current is easier than it used to be. GenAI is genuinely useful here: updating a requirements file, refreshing a README, or refactoring a notebook so it runs on current library versions take just an hour with the right tools. A portfolio that shows recent activity looks alive.
You don't need a perfect portfolio-you need a growing one. Start with something you actually care about, document it as though someone else will need to use it, and publish results honestly. The people with the most credible portfolios aren't the ones who planned the most impressive arc; they're the ones who kept building. (And no, you don't need a 365-day GitHub streak to prove it.)
Three solid, well-documented projects beat ten undocumented ones. Quality and range matter more than quantity. Aim for at least one supervised learning project, one that covers a different problem type, and one collaborative or research contribution.
Both. Kaggle is fine for learning and benchmarking. But a portfolio anchored entirely in Kaggle datasets can look like practice rounds rather than independent thinking. At least one project where you sourced and cleaned your own data demonstrates a skill that competition datasets skip entirely.
Not upfront. You can build meaningful projects with a working understanding of what models do and when to use them. Deeper mathematical intuition helps as you go - especially when things break - but it's not a prerequisite for getting started.
Yes, for most purposes. GitHub is where people will look first. Make sure your profile is clean, your repos have descriptive names, and your READMEs actually render properly. A portfolio that works on GitHub Pages or links to a personal site is even better.