Z Potentials | Joe Guo: The Creator of the AI Music Game-Changer Behind a $10M ARR and How ACE Studio Convinced 15 Grammy Winners to Ditch $500-an-hour Singers

Z Potentials invited Joe Guo, Founder and CEO of ACE Studio, to give a talk.

Mar 31, 2025

At a time when traditional music production remained an exclusive domain—barred by intricate music theory and prohibitive recording costs—a band of idealists began rewriting the rules with AI. Joe Guo’s decade-long journey, from lead singer to AI music entrepreneur, mirrors the industry’s shift from elite craftsmanship to universal expression. In 2019, driven by an almost obsessive belief in “de-tooling” creativity, he founded ACE Studio, bringing AI-powered vocal synthesis from professional studios to the desktops of everyday creators. When regulatory headwinds threatened to derail his vision, he executed a daring pivot—transforming ACE Studio from an entertainment app into a full-fledged productivity tool. While most AI music products continue to chase the spectacle of “one-click composition,” why does Joe’s team insist that Human-in-the-Loop is the real key to revolutionizing the industry?

Behind ACE Studio’s meteoric rise—from $50,000 to $800,000 in monthly revenue—was a bold, all-or-nothing bet: going global, physically. In 2023, with the domestic market reaching its limits, ACE Studio took a leap of faith at NAMM, the world’s premier music industry expo. There, a chance encounter with fifteen Grammy-winning musicians unveiled an urgent but unmet global demand: AI-powered solutions for hard-to-find child vocals, multilingual choral arrangements, and real-time track modifications. They weren’t replacing software; they were replacing human labor costs—often priced at $500 per hour.

From systematically reaching KOLs through Silicon Valley incubator HF0 to developing a proprietary 5-billion-parameter controllable AI model, this self-described “non-traditional” team has embraced a philosophy of relentless execution: act first, refine later. And Hollywood has taken notice—ACE Studio’s AI-powered vocal solutions are now becoming part of professional composers’ daily workflows. But their ambition goes far beyond software. If generative AI meets the music template economy, could this be the next Canva-style revolution in creative empowerment?

In this article, we are honored to invite Joe Guo, the founder and CEO of ACE Studio, to share his journey, insights, and vision for the future of AI-powered music. Enjoy!

Z Highlights

“I realized that we shouldn’t just conform to tools—we should make tools conform to us. Everyone has emotions and a need to express themselves. With this in mind, in early 2019, we decided to redefine music creation with AI. Humans should not be reduced to mere tools; we shouldn’t compete with machines in logic or speed. The human brain isn’t designed to process extreme complexity.”
“My aesthetic philosophy is de-tooling; just as ancient scholars said, ‘A gentleman is not a vessel’—humans are not meant to be mere instruments of labor. Tools shall serve human, enabling them to delineate big ideas and authenticity.”
“Transforming virtual singers into ACE Studio was about freeing creativity. We’re not replacing software that already solves 80% of the problem. We’re filling a gap that no existing product has addressed.”
“Our key to success is action. Instead of hesitating, we chose to act.”
“CapCut revolutionized video editing by streamlining and specializing; they expanded the creator base tenfold, even a hundredfold. The music industry shall follow the trajectory.”

01 From Vocalist to AI Music Pioneer: How Joe Guo is Redefining Music Creation

ZP: Welcome, Joe! Could you start by introducing yourself to our readers?

Joe Guo: Hello, everyone. I’m Joe, and I’m currently building ACE Studio, an AI-powered music workstation. Our core feature is “Text to Singing Voice”—you input lyrics and melodies, and our system generates vocal performances. We designed it specifically for professional musicians and creators.

Looking back, my journey started in college when I was in a band, writing original songs. My dream back then was simple: to produce professional-level music that was good enough for release. I had the inspiration, the songs had their own unique charm—but the barriers were too high. The production tools were complicated, clunky, and frustrating to use, which meant my music never saw the light of day.

After graduation, I worked at a gaming company, helping scale mobile games from zero to over a hundred million users. Then, from late 2015 to the end of 2016, I took a gap year—half in Beijing, half in Silicon Valley—to explore new possibilities. I met with entrepreneurs, studied what they were building, and, as someone without a technical background, I used this time to teach myself advanced mathematics, machine learning, programming, and improve my English. During my time in Silicon Valley, I also attended Draper University—a startup school founded by venture capitalist Tim Draper—where I had the chance to engage with investors like Elon Musk.

At the start of 2017, I returned to China and embarked on two startup ventures. My first attempt was a chatbot project, but both the technology and my own entrepreneurial experience weren’t quite there yet. By the end of 2018, I decided to shut it down. After that, I began reassessing my next steps. My college experience in music production and my self-taught programming journey during my gap year stood in stark contrast: I realized that technology wasn’t the real barrier—the troubles were outdated, cumbersome tools. That’s when I realized tools should cater to creators, instead of the latter adapting to tools. Every artist is unique in emotions and expressions. So, in early 2019, I made up my mind to redefine music creation with AI.

For the next three years, we launched an app called ACE Virtual Singer, but in late 2022, it was taken down due to regulatory constraints. This setback led to an unexpected insight: many of our users weren’t just casual fans—they were professional musicians using our software to boost their productivity. We then reworked this entertainment app to a desktop software built specifically for music production—what is now ACE Studio.

Since the launch in October 2023, the growth of ACE Studio has been remarkable. Initially, our monthly revenue was in the tens of thousands of dollars. By September 2024, after joining HF0 (ZP Annotation: A startup incubator under Hacker Fellowship, supported by Hydra, provides funding, resources, and support to highly engineered entrepreneurs to help them start tech-driven companies), we had scaled to $80–90K per month. After graduating from the accelerator, our monthly revenue in December, January, and February skyrocketed to $800K.

ZP: You’ve been through multiple ventures. Looking back at your first startup in 2017—a chatbot project—what was the market like back then? What problem were you hoping to solve?

Joe Guo: Back then, I was an inexperienced entrepreneur, eager to work with AI but without a precise direction. Chatbots seemed futuristic and exciting, so we jumped in. Since general-purpose chatbots weren’t performing well, we looked for a niche use case and landed on the automotive industry—from pre-sale customer inquiries to post-sale maintenance support. The idea was simple: users could ask, “Something’s wrong with my car—what should I do?”, and the system would provide relevant information, troubleshooting steps, and repair guidance.

However, we ran into both business and technical challenges. Commercially, even if a bot could diagnose a problem, it couldn’t physically fix a car, which limited its value. Technologically, we were essentially trying to build a Retrieval-Augmented Generation (RAG) system long before RAG became mainstream. The challenge was allowing open-ended user queries while also ensuring accurate responses from a limited database. At the time, our models weren’t advanced enough, and the system relied too much on predefined rules, which made it difficult to deliver a truly useful experience.

ZP: What drew you to AI so early, back in 2017?

Joe Guo: It’s really about aesthetic fascination for me. AI, at its core, is a complex system—a vast network of simple components that interact in surprisingly intricate ways. Each individual element follows basic rules, but when combined, they emerge into something greater, something we recognize as intelligence. That phenomenon deeply fascinated me.

ZP: As a serial entrepreneur, what drives you to keep building?

Joe Guo: That’s an interesting question. Many investors prefer founders who have already made a lot of money because they’re less focused on short-term gains and more interested in long-term impact. But I think there’s another type of entrepreneur worth paying attention to: the kind who has no choice but to build.

I fall into this second category. I start companies not because I want to but because I can’t see another way forward. I like doing things my way. If I were working at a traditional company, I’d have to bend to its structure, and the energy spent adapting would outweigh the energy spent creating. Also, because I’ve never made huge amounts of money, I have very few material desires. My goal is simple: build something extraordinary.

ZP: What did you expect from yourself 10 years ago? Have you achieved it? Looking ahead, what kind of person do you want to become in the next 10 years?

Joe Guo: A decade ago, I felt like I was wandering in the dark. It wasn’t until my gap year that I started gaining clarity. Back then, my dream was to build a startup in Silicon Valley—and it took me eight years to finally make that a reality.

Looking forward, I want to create something that genuinely impacts the world—and, more importantly, I want to shape it in line with my aesthetic vision. I want a world where creative expression is effortless, where making music feels as natural as speaking. Because at its core, music is an innate, childlike human ability—something we should all be able to do freely.

ZP: Outside of work, what are your hobbies?

Joe Guo: I love boxing. I trained for three years in Beijing and got to a semi-professional level, about as skilled as a typical boxing gym coach.

ZP: Could you recommend some books that have influenced you?

Joe Guo: One book that shaped my perspective is Deep Simplicity, which explores chaos theory and complex systems—it was fundamental in shaping my aesthetic sense.

Lately, I’ve been really into Phil Knight’s autobiography, Shoe Dog. It’s an honest, raw account of what it took to build Nike. One of the most powerful parts of the book is how Knight and his team were constantly underestimated—they were just a bunch of guys from Portland, seen as outsiders, nobodies. They even felt self-conscious about it. But the magic of Knight was that he brought out the best in those people, empowering them to turn their unique talents into something remarkable. That really resonated with me—it gave me a deeper understanding of what it takes to build a great company.

02 Let AI Redefine Music Creation: A Journey of Unparalleled Execution and Breaking Through in the Global Market

ZP: Returning to the starting point of this venture—when you first envisioned redefining music production with AI in 2019, what opportunities did you observe?

Joe Guo: When I returned from the U.S. in 2017, I was already searching for a completely new approach to music creation—or, more precisely, looking for an opportunity to remove the constraints of tools. This stemmed from my experiences during my gap year, where I learned a great deal and was astonished to find myself mastering subjects I had previously struggled with.

For instance, I had taken C++ courses in university. While I now realize that the curriculum was quite basic, at the time, it seemed utterly incomprehensible—like an alien script. Later, I realized that the problem wasn’t my aptitude but the way it was taught. The abstract concepts weren’t anchored to anything tangible or meaningful. It was only when I studied coding, advanced mathematics, and machine learning through MIT OpenCourseWare that I discovered a fundamental truth: all the seemingly complex and highly specialized knowledge we see today originates from simple ideas, which then layer upon each other. However, when this layering process is skipped—when people are force-fed the end result without context—it becomes overwhelmingly difficult to grasp.

This realization shaped my aesthetic view of the world: human cognition operates on a similar baseline, meaning people should not be reduced to mere tools. Competing with computers in intelligence or computational speed is futile because the human brain is not designed for extreme complexity. If something appears overwhelmingly complex today, it is either due to poor teaching—leading to a misunderstanding of its difficulty—or because the tool itself is poorly designed.

Take music creation, for example: overly complicated tools can hinder genuine human expression, making creators wrongly believe they lack talent. This reinforced my conviction that the key is de-tooling—people should not be turned into tools. Just as the ancients said, a gentleman is not a vessel—humanity should be served by tools, not constrained by them. Tools should empower individuals to freely express their emotions, big ideas, and authenticity.

ZP: This philosophy led to the creation of ACE Virtual Songstress. What was the original product positioning, and why did you later pivot toward a more professional toolset?

Joe Guo: Our initial goal with the virtual songstress was straightforward: to enable ordinary people to create and share music. However, beyond external regulatory challenges, the product faced a fundamental issue—it was neither accessible enough for entertainment users nor professional enough for productivity-focused users. This led to an awkward middle ground where the product struggled to gain traction. In hindsight, Suno represents the most successful iteration of this concept.

At that time, we started asking ourselves: what is the Matthew Effect in the AI era? Although no definitive answer exists, our intuition was that AI should be leveraged for creative tasks. Unlike the internet, which revolutionized human connectivity, AI fundamentally transforms the productivity of content creation. This realization led us to repurpose our virtual songstress into ACE Studio—an actual tool for enhancing music creation.

ZP: After the pivot, how did you define your core value for professional users?

Joe Guo: Through our experience with ACE Virtual Songstress, we identified a major pain point in music production: vocals are difficult to obtain. It mirrors the challenge in the voiceover industry before text-to-speech (TTS) technology—previously, only professionals could record high-quality voiceovers, but now TTS has democratized the process. Similarly, in the music industry, 99% of music contains vocals, yet acquiring professional-quality singing remains costly and labor-intensive.

Take film scoring, for instance. Suppose a producer needs a child’s voice for a commercial. Finding a five-year-old with singing talent is an incredibly difficult task. With ACE Studio, however, they can instantly generate the perfect vocal performance without the logistical nightmare of hiring a child singer and booking a professional studio.

As we uncovered more real-world challenges like this, it became evident that AI could provide a direct solution—and notably, no traditional method existed before AI to address these issues efficiently. Unlike other AI applications that merely refine existing workflows, we were solving a problem that no prior tool or software could fully address.

To maximize versatility, we curated representative voices across various styles—opera, folk, children’s voices, and more. Additionally, we expanded language support beyond Chinese and English to include Spanish and Japanese, with plans to incorporate French, German, and others in the near future.

ZP: ACE Studio has seen remarkable progress. Can you share how you gradually found PMF?

Joe Guo: When we launched ACE Virtual Songstress, we explored many avenues. But with ACE Studio, we had direct insight into user behavior, which assured us there was demand. The key unknowns were: How many users would adopt it? How much revenue could we generate? How large could we scale?

Eventually, our instincts led us to expand overseas. The global market for music production is significantly larger, whereas China lacks a well-developed music ecosystem. We made a bold decision to fully commit to international expansion—a gamble that, in hindsight, paid off.

That said, even now, I believe we haven’t fully reached PMF yet. True PMF means that even if your monetization model is flawed, or your user onboarding process is weak, users will still persist, figure things out, and pay for the product. Examples include DocuSign and Meta’s ad platform—despite their imperfections, users remain deeply invested.

ZP: When did you decide to expand internationally?

Joe Guo: When we launched in early 2023, revenue was around $50,000–$60,000 per month. But over time, domestic demand plateaued—China had 600,000 registered NetEase Cloud Music users, while Spotify had 20 million. The difference was staggering.

From the outset, our October 2023 launch was designed for global accessibility and payments. We also began marketing on YouTube, Twitter, and Discord, but adoption was slow. Despite reaching out to 100 overseas influencers, we received almost zero responses. At times, it felt discouraging. However, investors and friends reminded me: “International expansion is money on the table. If it isn’t working, the problem lies in GTM, not the product.”

The turning point came in January 2024 when we attended the NAMM Show in the U.S. The event brought an influx of industry professionals—many discovered us through word-of-mouth and visited our booth. Later, we found that out of 60 attendees who left their contact information, 15 were Grammy winners or nominees. This proved that our product was genuinely valuable; the challenge was simply visibility.

This first wave of international users propelled our monthly revenue to $180,000. However, due to limitations in English-language vocal modeling, revenue dipped to $80,000–$90,000—but the trajectory toward a $1M ARR was clear.

The second wave came when we applied to HF0 (ZP annotation: HF0 is a permanent startup incubator co-founded by Lucy Guo and Dave Fontenot in 2019). Here, we were deeply immersed in growth strategies, during which we rolled out a significant number of new features and upgraded our Foundation Model to ensure a truly optimal user experience for Western audiences.

We also formalized the process of Influencer Reach Out into a SOP. Reflecting on why we initially received zero responses, we realized that outreach must be continuous and persistent. If no one in overseas markets has heard of you, early Reach Outs are met with strong skepticism. The key is to establish some initial exposure—appearing at music festivals, for instance—to reassure potential users that you are not a fly-by-night operation.

At first, we had no clear benchmark for response rates. What percentage of replies should we expect from reaching out to 100 people? Eventually, we learned that a 10% response rate is reasonable. The real challenge was building a system capable of reaching out to 100 people per day in a sustainable manner. This was something I learned from Blake Anderson (ZP annotation: founder of Cal AI, a calorie-tracking AI application with million-dollar ARR) during my time at HF0. His strategy was to reach out to 200 influencers weekly, continuously refining outreach templates, optimizing reply rates, and improving RPM. Whenever a group of influencers reached a point of diminishing returns in cost-effectiveness, he would pivot to a new set. These are highly systematic, military-grade growth strategies.

“Follow the orthodox path before seeking unconventional breakthroughs”—the first priority is to execute the fundamentals systematically and thoroughly.

ZP: What were the key factors behind your success in expanding overseas?

Joe Guo: Looking at our team, none of us had a natural advantage in going global. I never studied abroad and didn’t even pass CET-4 in university. One of my co-founders was a music and arts student with only an undergraduate degree from China. The third co-founder spent just a year studying in the UK. So, we weren’t exactly the typical candidates for a successful international expansion.

But what made the difference was execution—we didn’t overthink things; we just took action. For instance, when I first heard about HF0 from other founders, I didn’t even know it was an accelerator. I simply looked it up online, found the application page, and spent about an hour filling it out. Four or five days later, I checked my inbox and saw an interview invitation. There was only one 15-minute slot left, so I booked it immediately. After the online interview, I flew to San Francisco the next day for the in-person interview. By the third day, I had the offer.

The whole process was so smooth it almost felt surreal. But many people hesitate at different points—wondering if it’s worth it, if they’ll pass the interview. I just thought it was the best opportunity to break into the global market, so I went for it.

The same mindset applied when we went to a U.S. music expo. We applied just a week in advance, made our own materials through Taobao, and when people said international shipping wouldn’t arrive in time, we just carried everything on the plane—even if it meant paying over $1,000 in fines. If we were missing something, we borrowed it locally. We just made it happen.

Some also said that not being native English speakers would hurt our chances of raising funds. But I’ve seen plenty of students who studied abroad whose English isn’t better than mine now. I just forced myself to do it, to learn by doing. Back in 2016, during my gap year abroad, I was too afraid to even speak English. Now, I can have fluent conversations with foreign investors.

ZP: How has your perspective on Total Addressable Market (TAM) evolved since expanding overseas?

Joe Guo: My thinking has changed significantly. The number of musicians worldwide is substantial, yet the music production industry hasn’t produced a truly dominant company. Most are just making $100M–$200M annually.

To understand why, we can look at the video industry 20 years ago. Back then, video production was highly complex, so the industry was fragmented into many small companies handling different parts of the process. Content creators were highly specialized. There weren’t many of them.

But today, CapCut has 900 million MAU, meaning 1 in 10 people globally is now a video creator. CapCut achieved this by doing two things. The first is to integrate a highly specialized, fragmented industry into a single, easy-to-use tool. The second is to expand the user base by 10x or even 100x.

I believe music creation will follow the same trajectory. Right now, the industry is scattered across thousands of companies—some focus on plugins, others on audio fine-tuning, or virtual instruments. But ultimately, they all serve the same purpose: to produce a few minutes of audio content.

Our long-term vision is to build a simple yet powerful tool that integrates the entire music production process, making it accessible to both professionals and casual creators. And just like CapCut expanded the number of video creators, we aim to expand the number of music creators by 10x or even 100x.

ZP: How did you determine your pricing strategy?

Joe Guo: We started by studying our target audience’s spending habits. For instance, we found that some musicians are accustomed to paying $500 for a lifetime membership. With that in mind, we set our product pricing at around $200 per year, which felt like a more reasonable range. To finalize the price, we conducted three to four rounds of A/B testing over the past three months, and the final figures were determined based on those results.

Currently, our business model does not offer monthly subscriptions—only annual plans priced at $199 and $264. And yet, our conversion rates haven’t dropped. This decision, too, was driven by testing. In this industry, users tend to prefer one-time purchases over recurring payments. A monthly subscription model would require frequent use, but in reality, while our product is essential to musicians, it is not something they need every day. Many music creators might only produce a song every few months. For them, a single use of our product is already worth the cost—after all, hiring a singer in the U.S. typically costs between $300 and $500 per hour. That’s why an annual plan makes far more sense than a monthly one.

ZP: Can you introduce your current tech stack? What models and algorithms have you developed in-house?

Joe Guo: Our core technology revolves around AI-generated vocals. Music, at its core, is just multiple audio tracks layered together—just like how images are made of different layers. Once we have the vocal layer, we need additional layers (e.g., instruments) to complete the composition.

For instrumental tracks, we use a text-to-music approach, where users enter a prompt, and AI generates the corresponding melody. This works by integrating large models with ControlNet to ensure each audio track aligns seamlessly with the rest.

Right now, we’ve built a model comparable to Suno 2.5, but with more control features. It has 5 billion parameters and will enter beta testing in the next 1–2 months.

Last year’s biggest update was transitioning from a vocal workstation to a full-fledged music creation platform.

ZP: How does your differentiation from Suno play out in the short and long term?

Joe Guo: At present, the differences are quite significant. Suno is more consumer-focused (ToC), while we cater to professionals and creators (ToP). Our business model also diverges—Suno offers generous free usage, whereas we operate on an annual subscription basis.

In the long run, the distinction can be compared to the difference between Comfy UI/Krea and Midjourney/Flux. Users are no longer satisfied with “Model as a Product” alone; they seek deeper workflows and better interaction interfaces. This is why HF0 invested in us—they believe we can become the Comfy UI of the music industry.

03 Leveraging Foundational Model Iteration to Build Long-term Barriers with a Canva-style Ecosystem.

ZP: What are your key product plans for the next 2–3 years? How do you plan to expand your product line?

Joe Guo: Our long-term goal is to reinvent how people create music.

Current AI music tools, like Suno, don’t actually help people create their own music—they just generate songs for them. That’s fine if you just want to send a song to your girlfriend, but true creativity requires a tool that helps users express themselves.

That’s why we’re not building an end-to-end product. Instead, we’re creating a fully-featured workstation, where AI serves as a creative assistant in a human-in-the-loop workflow. Users don’t need extensive musical training, but they do need to learn the tool—similar to how Cursor helps people write code with AI.

Another key shift is that AI music tools shouldn’t just be standalone models—they should be platforms. Our goal is to build an open ecosystem where different models can be integrated as plugins, giving users maximum flexibility in their creative process.

ZP: Your company has already made impressive progress in commercialization. How do you plan to maintain this first-mover advantage?

Joe Guo: First, our product is not easily copied. We specialize in vocal synthesis, a field that predates the GenAI wave and has undergone multiple iterations of technical architecture. The expertise required is substantial. For example, high-quality labeled data demands recording-studio-level datasets, and we’ve automated much of the annotation process. Then there’s the business side—negotiating revenue-sharing contracts with singers and structuring our monetization model. These factors create significant barriers for new entrants.

Second, we are poised to benefit from network effects. Inspired by Canva and CapCut, we are building a music-template community that integrates AI and large models. Users will be able to upload their music templates to our platform, allowing others to remix and refine them. For this to work, we need to provide an intuitive, modular creation tool—one that makes it easy for users to craft, adapt, and reuse music templates. Ultimately, we want this ecosystem to be financially rewarding for template creators as well.

ZP: Since entering the AI industry, what major breakthroughs and transformations have you observed in the audio space?

Joe Guo: The work we are doing today was theoretically possible two or three years ago—it’s just that we’ve been iterating alongside the technology. However, one real game-changer in recent years has been Text-to-Music. The fundamental reason for this shift is the rapid qualitative leap in large models.

We’re also seeing a trend toward standardization. Many content-generation approaches are converging on simpler DiT model architectures, with consensus forming around model structures and training paradigms. As a result, developing a foundation model today is less about reinventing the wheel and more about leveraging existing methodologies effectively. Many theories now suggest that as long as input-output processing and data handling are sound, the model itself will deliver decent results. This is a major variable that has only emerged in the past few years.

ZP: What are your expectations for AI advancements in the next 5 years?

Joe Guo: I believe the biggest breakthrough will be an LLM that truly understands music theory.

Today’s LLMs can write and debug code—not because they’re inherently good at it, but because we’ve built platforms that help them interpret and structure tasks. Music theory is actually much simpler than coding, yet AI models still struggle with it.

We’re working on fine-tuning models specifically for this purpose. If we succeed, AI-assisted music production will become as intuitive as AI-assisted coding.

Disclaimer

Please note that the content of this interview has been edited and approved by Joe Guo. It reflects the personal views of the interviewee. We encourage readers to engage by leaving comments and sharing their thoughts on this interview. For more information about Ace Studio, please visit their official website at ACE Studio ai.

Z Potentials will continue to feature interviews with entrepreneurs in fields such as artificial intelligence, robotics, and globalization. We warmly invite those of you who are hopeful for the future to join our community to share, learn, and grow with us.

Image source: Joe Guo & ACE Studio.

Z Potentials