Founding AI Engineer
Company details
Company: 19TWELVE
Job type: Remote
Country: United States
City: Los Angeles
Experience: 4 years or more
Description of the offer
We are building the generative voice infrastructure for the Global South.
Current models are optimized for clean, formal English in high-resource environments. We are solving for the inverse: Low-resource languages, high-noise environments, and heavy code-switching.
We are looking for a Systems Mechanic—a single, highly capable engineer who can own the technical spine of a generative audio engine. This is not a research role for writing papers. This is an applied engineering role for someone who can take open-source foundations and force them to perform in the real world.
The Engagement
- Structure: 3-Month Contract with clear deliverables.
- Objective: Deliver a functional, scalable inference engine that meets specific latency and quality benchmarks.
- Future: Successful delivery opens the door to a Founding Engineer role with significant equity.
The Engineering Challenge
You will be responsible for architecting and building the engine from the ground up. You must solve three specific constraints:
- The Data Reality: You will not have clean studio data. You must build a pipeline that can ingest “noisy” real-world audio (radio archives, podcasts, street interviews) and autonomously clean, align, and diarize it to create a high-fidelity training set.
- The Linguistic Complexity: The model must handle Code-Switching (fluidly mixing two languages in one sentence) and Tonal markers without breaking prosody. You must understand how to modify tokenizers to respect these nuances.
- The Inference Economics: We are not burning venture capital on infinite compute. You must quantize and optimize the model to run on consumer-grade GPUs with low latency. Efficiency is a constraint, not a nice-to-have.
What You Will Own
- End-to-End Pipeline: From raw audio ingestion to served API response.
- Model Fine-Tuning: Adapting foundation models to highly specific, low-resource dialects.
- Inference Architecture: Building a stateless, containerized inference server that handles concurrent requests with sub-200ms latency.
The DNA We Need
- Systems Thinker: You don’t just train models; you build products. You understand how the model sits inside a container, how the API handles backpressure, and how the tokenizer affects the runtime.
- Data Realist: You know that 80% of the work is in the dataset. You are comfortable writing custom scripts to slice, denoise, and filter terabytes of audio.
- First-Principles Optimiser: You understand why a model is slow. You are comfortable with quantization, distillation, and kernel-level optimizations to squeeze performance out of limited hardware.
How to Apply
We do not read generic cover letters. To demonstrate your understanding of the problem space, please answer the following question in your application:
> “We need to fine-tune a generative voice model on a low-resource dialect that heavily mixes English with a tonal local language. The training data comes from noisy radio broadcasts.
> Describe your specific technical workflow to turn this raw audio into a clean, aligned dataset. How would you handle the tokenizer issues caused by the mixed languages?”
Sponsored ads
*(Answer in 3-5 sentences. Focus on the architectural approach, not specific tool names).
Location of employment
How to apply?
Click on the button to get the company email or employment application form.
Apply with External LinkSponsored ads
