hero

Join the Flashpoint family!

Research Scientist, Voice AI

Podcastle

Podcastle

Software Engineering, Data Science
Yerevan, Armenia
Posted on Feb 17, 2026

About the company:

We’re a technology-first team building the next generation of AI-powered audio and video creation. Our proprietary text-to-speech models rank among the top on Hugging Face leaderboards, and our research spans voice cloning, audio processing, and video understanding.

At the core of what we build is real-time, low-latency voice technology that delivers expressive, controllable, multilingual speech. We care deeply about production reliability at scale, which means consistent voice quality, fast response times, robust streaming, and tools that are easy to integrate and trust.

About the team:

You’ll be joining a team that works at the intersection of cutting-edge AI and real-world creative workflows, with a shared mission to make professional-grade content creation accessible and genuinely enjoyable.

About the role:

We’re hiring a Research Scientist, Voice AI to push the frontier of human-sounding, controllable, multilingual speech and turn breakthroughs into production-grade systems. This role is for someone who loves deep research and also cares about making models fast, stable, and usable in the real world.

You’ll take full ownership of research initiatives - from shaping the initial idea, running large-scale experiments, and evaluating results, all the way to preparing models for real-world deployment.

Core research tracks:

  • Next-gen TTS — Build streaming and high-fidelity models optimized for real-time latency, naturalness, and production stability
  • Audio tokens & codecs — Design scalable discrete representations that make speech models faster, more efficient, and easier to control
  • Controllable speech — Enable expressive generation through speed, style, emotion, voice design, and open-ended instruction control
  • Learning for quality — Push quality beyond supervised losses using stronger signals: preference optimization / RL-style approaches, speech critics, and evaluation systems that correlate better with real user perception
  • Multilingual robustness — Scale quality across languages, scripts, and real-world edge cases

Typical responsibilities include:

  • Run end-to-end research cycles: turn ideas into experiments, training runs, and clear conclusions
  • Design and improve generative speech models for both streaming and high-quality use cases
  • Measure what matters: create strong evals (objective + human/perceptual) and use better signals to improve behavior
  • Collaborate with ML/infra/engineering to deploy research into production APIs
  • Communicate clearly: what you tested, what improved, what didn’t, and what we should do next

We’re looking for someone with the following skills and qualifications:

  • 2+ years building deep learning systems (industry or academia), with a track record of owning work end-to-end
  • Strong fundamentals in representation learning, sequence modeling, and modern generative modeling
  • Hands-on experience with Transformers/LLM-style training, diffusion/flow models, and/or audio generation
  • Excellent Python + PyTorch — you’re comfortable running large training jobs and debugging tricky training issues
  • Research-to-production mindset — you care about speed/latency, robustness, reproducibility, and clean integration into real products

Nice to have:

  • Prior work in TTS, voice conversion, speech enhancement, or ASR
  • Experience with discrete audio modeling (codecs, vector quantization, token LMs)
  • Publications, open-source contributions, or notable applied research projects
  • Practical experience with streaming inference / real-time constraints

Why Async?

  • Startup Environment: Experience the energy and agility of a fast-growing startup where your contributions directly shape product direction and company success
  • Cutting-Edge AI Technologies: Work with the latest AI agents, LLMs at the forefront of audio and video AI innovation
  • Professional Development: opportunities to grow your skills in AI and distributed systems
  • Flat Company Structure: promoting direct collaboration and rapid decision-making
  • Health Insurance: coverage with comprehensive benefits
  • GYM Membership: benefits to support your well-being
  • Pioneering Mindset: with innovative people around you pushing the boundaries of what's possible

At Async, we believe artificial intelligence has the potential to help people solve immense creative challenges, and we want the upside of AI-powered content creation to be widely shared. Join us in shaping the future of audio and video technology.