NLP Data Scientist

Our client is seeking a Senior NLP Data Scientist to lead technology development on the frontier of code discovery and developer productivity. A successful applicant is one capable of building software using a variety of technologies. They are polyglots and can learn and adapt to solve the problem at hand. You will help our team design, test, and rapidly iterate on multiple products and services stemming from our core technology. You will develop prototypes, tools, and methods that inform decision-making for software developers (e.g. “Is this the right solution to my coding problem?” or “How do I implement this specific code in my application?” or “What code libraries are other developers using to solve my problem?”).

About the Company 

The team are building an intelligent code discovery platform that gives developers the best tools to discover code in any form and be more productive. They are transforming code search to improve the practice of modern programming—taking a graph-based approach using data from the entire open source ecosystem. They’re on a mission to build the world’s best code discovery engine. The business is funded by top investors in Silicon Valley, including the first investors of Google, Twitter, Zoom, LinkedIn, and Uber. Our team has backgrounds from NASA, LinkedIn, Facebook, Amazon, AWS, Cisco and MIT, Harvard, Stanford, and Berkeley. Our company is based in San Francisco, California, but the team is all remote and globally distributed.

Description

We seek a Senior NLP Data Scientist to lead technology development on the frontier of code discovery and developer productivity. You are a world class data scientist and software engineer, writing top notch code and designing production-grade NLP models. A successful applicant is an expert in data science, machine learning, software engineering, and complex data analysis spanning natural language, code syntax and networks. You will help our team identify, analyze, and process large heterogeneous data sets. You will develop prototypes, tools, and methods that inform decision-making for software developers (e.g. “Is this the right solution to my coding problem?” or “How do I implement this specific code in my application?” or “What code libraries are other developers using to solve my problem?”). 

Who Will Love This Job

The ideal candidate is excited to lead the direction of our data science and technology development. You are passionate about using machine learning to empower better software development. You are passionate about designing and deploying production-grade NLP models, and writing world class code.

Stack

  • Backend, data fetching pipelines, tooling is built with Go

  • Frontend is built using TypeScript & Svelte

  • ML stack is built using Python & PyTorch

  • Cloud automation is built using Terraform

  • Data is primarily stored in PostgreSQL

  • Search engine is powered by OpenSearch

  • Services run on Google’s Cloud Platform

Responsibilities

  • Design and train production-grade NLP models

  • Build complete data processing systems that drive products, systems or applications 

  • Lead experimentation processes that accelerate prototyping and maximize resource utilization

  • Process data pipelines for machine learning operations: scheduling, ETL, dataflow programming, SQL, data labeling, representation learning, hyperparameter tuning, and model management

  • Produce and deploy internal and external APIs

  • Design and implement predictive models on multiple decision platforms

  • Apply the latest techniques academic research to real world problems in production environment

  • Review code, mentor other engineers and support the data science and engineering teams

  • Attract, recruit and retain top data science and engineering talent

Minimum Qualifications

  • Expertise in Natural Language Processing and Understanding (NLP & NLU)

  • Expertise in microservices and cloud computing—in at least one cloud platform

  • Familiar with distributed systems and the orchestration of large numbers of independent commodity machines into complete, functional systems to handle diverse workloads

  • Expertise performing data science research

  • Expertise writing world class Python code

Preferred Qualifications

  • PhD in computer science, artificial intelligence, machine learning or related technical field

  • 10+ years of professional data science and software engineering experience

  • Advanced working knowledge of information retrieval and search technologies and have set up and used open-source search systems to query and understand data

  • Expertise with Go

  • Experience with many of the following technologies:

  • Modern ML Models (e.g. BERT)

  • ElasticSearch, Solr and equivalent 

  • Kubernetes, Docker, Terraform

  • Machine learning infrastructure

  • Deep learning, GNNs

  • CircleCI, GitHub Actions, Jenkins or equivalent

  • Graph databases

What’s in the Offer

You have the opportunity to join an early stage startup and have significant ownership of technology development. You will work at the highest level and collaborate with world-class colleagues, advisors and technical experts. 

  • Competitive salary & equity packages

  • Unlimited vacation and sick leave

  • Strong remote work culture and esprit de corps

Related Jobs

Data Tech Lead

Our client is a technology solutions company passionate about Customer tailored product development. From...