In the Loop: Week Ending 2/14/26
Last Week in AI: "Something Big," Claude's Philosopher, Uncanny Valentine Anthropic’s safety-first identity is under strain as researcher departures, ...
A couple weeks ago, I was flying to St. Louis for a three-day client workshop and, like I often do, got to wondering what I was flying over. I've always had to guess, but on this trip, I decided to try out Gemini, Google’s AI-powered multimodal tool, to see if it could help me figure out where I was.
I fired up the app, took a photo out the window of the plane and gave it a simple prompt: Based on this image, where do you think I am? Talk me through how you came to this conclusion. I wanted to see what Gemini was capable of, but also wanted to understand the process it took to get there.

Here is Gemini's response:
The image you sent shows a view from an airplane window, with a clear blue sky and a landscape of farmland below. Here's how I can use this information to make an educated guess about your location:
Landscape:
River:
Terrain:
Time of Day:
Based on these clues, here are a few regions that come to mind:

The second response narrowed my location significantly, quickly determining that the river was likely the Mississippi based on the following observations:
Possible Locations:
Based on these observations, the most likely location for the image is the Southern United States, specifically the region around Mississippi, Louisiana, or Arkansas. The Mississippi River is prominent in these states, and the climate in the South is generally milder, which would explain the later foliage change.
Once again, Gemini asked me for a bit more context, so I took a third photo, this one showing a large bend in the river below. I also told Gemini that it was 3:52 in the afternoon and asked it to tell me exactly where I was.

“The third image shows a distinctive bend in a river, which is a strong clue. This bend is likely the "Great Bend" of the Mississippi River, located near Memphis, Tennessee.”
A quick look at the flight map on the back of the seat in front of me confirmed that I was, in fact, flying over Memphis at that moment.

My jaw dropped. Using only three (not very good) photos and a little bit of additional context, Gemini pegged exactly where I was in less than two minutes.
I revisited this experience yesterday when I saw that Google has released Gemini 2.0, a further step in the progress of multimodal capabilities. If you’re not familiar with the term, multimodal means that an AI tool has the ability now to both take in and produce lots of different kinds of information.
When ChatGPT launched, it was simply a text-to-text tool: you typed in a text prompt, it spit out a text response. But with multimodal, you can now do text-to-code, voice-to-image, image-to-movie, spreadsheet-to-graph, or, as in my case, “image-to-where-in-the-world-is-Matt?” The list goes on…
So what does this mean? As we saw on 60 Minutes last weekend, multimodal can help Anderson Cooper draw a more accurate human body. It can teach the OpenAI team how to make pour-over coffee. It can even teach you how to make text “wiggle” in Adobe After Effects.

The same 60 Minutes segment that showed Anderson Cooper learning which side of the body the liver is on also highlighted Khanmigo, the new AI-driven tutoring tool developed by the Khan Academy, an education company that has revolutionized online learning. The goal of Khanmigo is to provide one-on-one, real-time tutoring that allows teachers to focus on each student’s specific learning needs rather than on the time-consuming process of reviewing and grading dozens or even hundreds of papers and tests.
Rather than making teachers obsolete, it puts them back into the role of truly teaching.
Imagine a multimodal AI that can look at a student’s work and provide real-time feedback that helps them learn more effectively. Or an AI that can be used in an operating room to avoid surgical complications. Or an AI that can look at an airplane engine and determine if it needs maintenance.
There’s been a lot of talk this year about AI agents, but the truth is that agents aren’t ready for primetime and may not be anytime soon. But multimodal AI is here and I don’t think it’s an exaggeration to say that the potential here is nearly limitless.
Last Week in AI: "Something Big," Claude's Philosopher, Uncanny Valentine Anthropic’s safety-first identity is under strain as researcher departures, ...
Looking Back at the Week When Noise Became Signal Earlier this week, I sent my parents and siblings the widely shared essay by Matt Shumer called Some...
Last Week in AI :AI Ads Get Super, Anthropic Rattles Wall Street, Renting Human Bodies AI's Super Bowl debut marks a cultural arrival—and a reckoning....
When Faced with the Extraordinary, Why Do We Focus on the Ordinary? There’s a scene in Gabriel García Márquez’s One HundredYears of Solitude where Rem...
Last week in AI: AI-Washing, Agents Get Social, Aronofsky’s AI Revolution Last week’s AI news exposed widening cracks beneath the hype. From AI-washin...
Why Agencies Need AI That Knows When Humans Matter Most I was talking with a senior agency leader recently, and she said something that hit me right b...