Senior Software Engineer, LLM Performance
Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience—free from vendor lock-in and designed for the next generation of AI workloads.
Job Description:
The Senior Software Engineer, LLM Performance plays a crucial role in delivering a competitive platform by focusing on efficiently scheduling, executing, and managing AI workloads on distributed compute systems. This role is deeply technical, spanning from low-level GPU kernels to distributed AI orchestration and Kubernetes (K8s) deployments. It is about more than optimization; it’s about pioneering efficient infrastructure that supports AI’s transformative role in reshaping productivity, revolutionizing industries, and addressing some of the world’s most challenging problems. You’ll ensure that generative AI — including large language models (LLMs), multi-modal models, and diffusion models — operates efficiently at enterprise scale while driving continuous improvements in cost, performance, and sustainability.
Responsibilities:
Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments.
Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers).
Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM.
Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality.
Qualifications:
Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, XLA, PyTorch, Jax, etc.
Background in performance analysis and optimization of AI/HPC workloads (e.g. profiling or theoretical analysis of Flops and bandwidth).
Experience in writing GPU kernels using technologies like CUDA, CUTLASS, Triton.
Strength in Python and C++.
Demonstrated contributions to open-source projects. Contributions to inference engines such as vLLM is a strong plus.
A production-oriented mindset emphasizing robust, scalable code suitable for enterprise-grade applications.
A relentless curiosity about cutting-edge AI technologies combined with a passion for solving complex problems.
What You Bring to the Table: We are looking for people who are eager to learn and master the lower-level compute concepts that are critical for the AI revolution. With us, your skills will not only contribute to coding but will also have a significant impact on the scalability and efficiency of AI applications at large. If you're geared up for the challenge of optimizing AI performance and eager to push our technological prowess to new heights, we're excited to welcome you aboard.
Recommended Jobs
Ingeniero(a) Mecánico y Tuberías Senior
Job Description Actualmente, estamos en la búsqueda de un(a) profesional para la posición de Ingeniero(a) de Mecanica y Tuberías Senior con la finalidad de consolidar nuestra base de datos para fu…
Junior QA Tester - Remote (United States)
Start Your Career as a Junior QA Tester – Remote Opportunity at Ace IT Careers! Are you passionate about technology and quality assurance? Ace IT Careers is offering an exciting opportunity for a …
Associate Counsel - Commercial Transactions
Posting Type Hybrid Job Overview The Associate Counsel – Commercial Transactions, works under the Director, Legal – Commercial Transactions. Primary Objective: Support the Commercial …
Warehouse Material Handler - Returns
Job Title Returns Material Handler Job Overview The Returns Material Handler is responsible for unloading product deliveries and assisting with organizing and maintaining existing invent…
Anesthesia Tech - Methodist Surgery (Peoria)
Overview This position will be responsible for assisting the anesthesiologists and anesthetists with day-to-day needs, ordering and restocking, and cleaning of anesthesia equipment. Provides anesth…
Staff Accountant
Role: Staff Accountant Location: Chicago Architecture Center Department: Finance Salary Rate: $70,000.00 - $75,000.00 / year Reports To: Accounting Manager FLSA Status: Full-time Sa…
Structured Finance - Structured Credit & CLOs, Senior Director - Chicago
As one of the world’s top three credit ratings agencies, Fitch Ratings plays a critical role in global capital markets by providing supplementary credit analysis, ratings, research, and commentary to…
Lactation Consultant RN IBCLC PRN IHR-Delnor New Life Maternity As Needed Days
The salary range for this position is $32.00 - $32.00 (Hourly Rate) Placement within the salary range is dependent on several factors such as relevant work experience and internal equity. For positi…
CDL A Truck Drivers - Multiple Positions - No-touch Freight (Joliet)
Hiring CDL-A Truck Drivers Hirschbach has multiple positions available for CDL-A OTR Solo and Existing Team truck drivers. Whether you are interested in owning your own truck and taking control of y…
Associate General Manager
Job Description Job Description Are you enthusiastic about leading teams to success and providing outstanding customer service in a high volume atmosphere? If so our Associate General Manager pos…