Join the Flashpoint family!

Search

My job alerts

Research Scientist, Voice AI

Podcastle

Software Engineering, Data Science

Yerevan, Armenia

Posted on Feb 17, 2026

Apply now

About the company:

We’re a technology-first team building the next generation of AI-powered audio and video creation. Our proprietary text-to-speech models rank among the top on Hugging Face leaderboards, and our research spans voice cloning, audio processing, and video understanding.

At the core of what we build is real-time, low-latency voice technology that delivers expressive, controllable, multilingual speech. We care deeply about production reliability at scale, which means consistent voice quality, fast response times, robust streaming, and tools that are easy to integrate and trust.

About the team:

You’ll be joining a team that works at the intersection of cutting-edge AI and real-world creative workflows, with a shared mission to make professional-grade content creation accessible and genuinely enjoyable.

About the role:

We’re hiring a Research Scientist, Voice AI to push the frontier of human-sounding, controllable, multilingual speech and turn breakthroughs into production-grade systems. This role is for someone who loves deep research and also cares about making models fast, stable, and usable in the real world.

You’ll take full ownership of research initiatives - from shaping the initial idea, running large-scale experiments, and evaluating results, all the way to preparing models for real-world deployment.

Core research tracks:

Next-gen TTS — Build streaming and high-fidelity models optimized for real-time latency, naturalness, and production stability
Audio tokens & codecs — Design scalable discrete representations that make speech models faster, more efficient, and easier to control
Controllable speech — Enable expressive generation through speed, style, emotion, voice design, and open-ended instruction control
Learning for quality — Push quality beyond supervised losses using stronger signals: preference optimization / RL-style approaches, speech critics, and evaluation systems that correlate better with real user perception
Multilingual robustness — Scale quality across languages, scripts, and real-world edge cases

Typical responsibilities include:

Run end-to-end research cycles: turn ideas into experiments, training runs, and clear conclusions
Design and improve generative speech models for both streaming and high-quality use cases
Measure what matters: create strong evals (objective + human/perceptual) and use better signals to improve behavior
Collaborate with ML/infra/engineering to deploy research into production APIs
Communicate clearly: what you tested, what improved, what didn’t, and what we should do next

We’re looking for someone with the following skills and qualifications:

2+ years building deep learning systems (industry or academia), with a track record of owning work end-to-end
Strong fundamentals in representation learning, sequence modeling, and modern generative modeling
Hands-on experience with Transformers/LLM-style training, diffusion/flow models, and/or audio generation
Excellent Python + PyTorch — you’re comfortable running large training jobs and debugging tricky training issues
Research-to-production mindset — you care about speed/latency, robustness, reproducibility, and clean integration into real products

Nice to have:

Prior work in TTS, voice conversion, speech enhancement, or ASR
Experience with discrete audio modeling (codecs, vector quantization, token LMs)
Publications, open-source contributions, or notable applied research projects
Practical experience with streaming inference / real-time constraints

Why Async?

Startup Environment: Experience the energy and agility of a fast-growing startup where your contributions directly shape product direction and company success
Cutting-Edge AI Technologies: Work with the latest AI agents, LLMs at the forefront of audio and video AI innovation
Professional Development: opportunities to grow your skills in AI and distributed systems
Flat Company Structure: promoting direct collaboration and rapid decision-making
Health Insurance: coverage with comprehensive benefits
GYM Membership: benefits to support your well-being
Pioneering Mindset: with innovative people around you pushing the boundaries of what's possible

At Async, we believe artificial intelligence has the potential to help people solve immense creative challenges, and we want the upside of AI-powered content creation to be widely shared. Join us in shaping the future of audio and video technology.

Apply now

See more open positions at Podcastle

Privacy policy Cookie policy

Media Kit

Flashpoint Venture Capital Fund III L.P., authorized as an Expert Fund by the Jersey Financial Services Commission with administrators regulated by the Jersey Financial Services Commission; Flashpoint Venture Debt Fund I L.P. and Flashpoint Secondary Fund I L.P., acting as Jersey Private Funds under the Jersey Private Fund Guide with administrators regulated by the Jersey Financial Services Commission; Flashpoint Venture Equity II L.P. and Flashpoint Growth Debt Fund (Cayman) II L.P., acting as Private Funds under Cayman Islands Private Funds Law with administrators regulated by the Cayman Islands Monetary Authority; and Flashpoint Venture Equity I Ltd., with administrators regulated by Cyprus Security and Exchange Commission.