Memories.ai: Enabling AI Agents to See and Remember the World

Building Human-Like Memory for AI: Our Journey from Research to Reality

Posted by Dr. Shawn Shen, Co-founder & CEO, Memories.ai

Today marks a pivotal moment not just for our team, but for the future of artificial intelligence. We're excited to announce that Memories.ai has emerged from stealth with $8 million in seed funding, led by Susa Ventures, with participation from Crane Venture Partners, Samsung Next, Fusion Fund, Seedcamp, and Creator Ventures. and more importantly, with a breakthrough that we believe will fundamentally change how AI systems understand and interact with the visual world.

We are launching the world’s first Large Visual Memory Model (LVMM). For the first time, we've cracked the code on unlimited visual memory for AI.

The Memory Problem That's Been Holding AI Back

When I was 14 years old, I moved to the UK on a scholarship to attend high school, leaving behind everything familiar: my home, my community, my friends. In those early months, I experienced firsthand how memory shapes who we are. I watched myself adapt to a completely new educational system and culture. I saw how my memories of home became both more precious and more distant. But what struck me most was how visual these memories were. I could close my eyes and see my childhood bedroom, remember the exact layout of my neighborhood, recall the faces of friends I'd left behind. These weren't just facts stored in my mind; they were rich, visual experiences that I could revisit and connect to new experiences in my UK school.

Most importantly, I realized that adaptation isn't just about learning new things. It's about how you connect new experiences to existing memories, how you build context over time through these visual connections. That journey taught me something profound about intelligence itself: memory isn't just storage. It's the foundation of understanding, learning, and growth.

Fast forward to my time at Meta Reality Labs, where my co-founder Ben (Enmin) Zhou and I worked side by side, often until midnight, pushing the boundaries of what AI could do. We published four top-tier research papers in a single year (work that typically represents an entire PhD career). But the more we advanced AI's capabilities, the more we realized we were missing something fundamental.

Current AI systems are incredibly intelligent, but they lack persistent memory.

Think about it: ChatGPT forgets your conversation the moment it ends. Gemini can analyze a 1-hour video, but ask it about a movie trilogy and it's lost. Even the most advanced models from Google, OpenAI, and others hit a wall when it comes to persistent, long-term visual memory.

Building memory systems isn't just about scale—it's one of the most technically demanding challenges in AI. You need to solve compression without losing meaning, indexing without losing context, and retrieval without losing speed. Most importantly, you need to design systems that get smarter as they remember more, rather than slower. The engineering complexity is why most companies have focused on making models think faster rather than remember better.

That's the problem we set out to solve.

Why Visual Memory Changes Everything

Human intelligence isn't just about processing information. It's about building a rich, interconnected web of experiences and memories that inform every decision we make. When you see a friend across the street, you don't just recognize their face; you remember every conversation you've had, every shared experience, every context that makes them uniquely them.

This is what's been missing from AI. While companies have been racing to make models smarter, we realized the real breakthrough would come from making them remember. Specifically, giving them the ability to see, understand, and recall visual experiences across unlimited timeframes.

Our Large Visual Memory Model (LVMM) doesn't just process videos; it builds persistent visual memories that can be searched, connected, and reasoned about, just like human memory.

How We Fundamentally Reimagined Video Understanding

The traditional approach to video AI is like trying to read an entire library by carrying all the books at once. Current systems load entire videos into context memory, which creates severe limitations. Most can only handle 30 minutes of content at a time.

We took a completely different approach, inspired by how Google revolutionized web search. Instead of loading everything at once, we:

1.Compress videos into rich memory representations

2.Index them into searchable structures

3.Aggregate information across graphical sources

4.Serve relevant memories through instant retrieval

The result? For the first time, you can ask an AI system: "How many times has LeBron James made three-point shots in his entire career?" and get an answer based on analyzing decades of basketball footage. Or "Show me every time someone mentions Nike in social media videos this month" across millions of TikTok clips.

Our approach has achieved state-of-the-art performance across all major video understanding benchmarks, but more importantly, it's already solving real problems for real customers.

Real Impact, Real Applications

We're not building technology for the sake of technology. Our Large Visual Memory Model is already solving real problems across multiple industries.

Security and Safety:We're working with security companies to help them instantly search through months of camera footage. Instead of manually reviewing hours of video to find a specific incident, security teams can now ask questions like "Show me all instances of unattended bags in the main terminal" and get immediate results from massive video archives.

Entertainment and Media: Studios are using our technology to navigate decades of archived content. Directors can instantly find specific shots, scenes, or visual elements across entire production libraries—turning what used to be weeks of manual searching into seconds of intelligent retrieval.

Brand and Marketing Intelligence: We're helping companies understand their brand perception by analyzing social media video at scale. Our system has indexed over 1 Million TikTok videos, allowing brands to ask questions like 'What is the viral trend for cosmetics videos?' or 'Which influencer has featured Tesla cars in their videos?' across millions of video posts.

AI Hardware: We're also working on AI hardware to integrate visual memory capabilities, preparing for next-generation user experiences where there is a third core device that understands and remembers what you've seen.

We've seen customers transform their video archives from storage problems into visual insights goldmines. That's exactly what we set out to do.

The Vision That Drives Us Forward

This is just the beginning. Ben (Enmin) and I believe we're building the memory layer for the next generation of AI—the foundation that will enable truly human-like artificial intelligence.

Think of it this way: memories are essentially a library of videos, experiences, and contexts that define who we are. We want to host these memories for AI systems, giving them the rich contextual understanding they need to become truly personalized to each human they interact with. When an AI has access to your visual memories—what you've seen, done, and experienced—it can understand you in ways that current AI simply cannot.

This means AI assistants that remember every interaction you've ever had, that understand your preferences not just from what you say but from what you see and do. Picture humanoid robots that can navigate the world not just through programming, but through accumulated visual experiences. Think about smart glasses or wearables that remember every place you've been, every person you've met, every moment that matters. These systems won't just process your requests; they'll understand the full context of your life.

This isn't science fiction. It's the logical next step in AI evolution. While others focus on making AI think faster, we're focused on helping AI remember better and understand humans more deeply through unlimited visual context.

Why We Left Meta to Build This

People often ask why we left one of the world's leading AI research labs to start a company. The answer is straightforward: some problems are too important to wait for.

At Meta, we could publish papers and advance the field incrementally. But to truly solve the visual memory problem (to build the infrastructure that future visual AI will depend on) we needed the freedom to move fast, think differently, and take risks that large companies simply can't take.

The fact that we've gone from research to working product to funded company in such a compressed timeline proves that when you combine academic rigor with startup velocity, extraordinary things become possible.

The Team That Makes It Possible

None of this would be possible without Ben (Enmin), whose incredible engineering talent transforms ambitious ideas into working reality. Our unique partnership (my research background combined with his product engineering expertise) allowed us to achieve what typically takes much larger teams years to accomplish.

We're also incredibly grateful for the support of our investors at Susa Ventures and our other partners who understood the significance of what we're building before it was obvious to everyone else.

What Comes Next

Today's announcement marks the end of Chapter One. We're working on integrating our memory systems into mobile devices, preparing for a world where your phone doesn't just store your photos but understands and remembers them.
We're building partnerships with hardware companies, expanding our customer base across industries, and constantly pushing the boundaries of what's possible when AI systems can truly see and remember.
But most importantly, we're preparing for a future where the line between human and artificial memories begins to blur. Where AI systems don't just process information, but build understanding through accumulated experience.

An Invitation to Join the Journey

If you're a company drowning in video data, a researcher fascinated by the intersection of visual memory and intelligence, or simply curious about what AI can become when it remembers, I invite you to join us.

Visit Memories.ai—that's Memories with an 's', because we're building systems that can hold countless visual memories—to try our first-party application and see for yourself what unlimited visual memories can do. Reach out if you want to explore how this technology might transform your industry or application.
The future of AI isn't just about making machines smarter. It's about making them see and remember. And that future starts today.