Senior Data Scientist

Senior Data Scientist – NLP

About the Company 

The team is building an intelligent code discovery platform that gives developers the best tools to discover code in any form and be more productive. They are transforming code search to improve the practice of modern programming-taking a graph-based approach using data from the entire open-source ecosystem. They’re on a mission to build the world’s best code discovery engine. The business is funded by top investors in Silicon Valley, including the first investors of Google, Twitter, Zoom, LinkedIn, and Uber. Our team has backgrounds from NASA, LinkedIn, Facebook, Amazon, AWS, Cisco and MIT, Harvard, Stanford, and Berkeley. Our company is based in San Francisco, California, but the team is all remote and globally distributed.

Description

We seek a Senior NLP Data Scientist to lead technology development on the frontier of code discovery and developer productivity. You are a world-class data scientist and software engineer, writing top-notch code and designing production-grade NLP models. A successful applicant is an expert in data science, machine learning, software engineering, and complex data analysis spanning natural language, code syntax and networks. You will help our team identify, analyze, and process large heterogeneous data sets. You will develop prototypes, tools, and methods that inform decision-making for software developers (e.g. “Is this the right solution to my coding problem?” or “How do I implement this specific code in my application?” or “What code libraries are other developers using to solve my problem?”). 

Who Will Love This Job

The ideal candidate is excited to lead the direction of our data science and technology development. You are passionate about using machine learning to empower better software development. You are passionate about designing and deploying production-grade NLP models and writing world-class code.

Stack

  • Backend, data fetching pipelines, and tooling is built with Go

  • The frontend is built using TypeScript & Svelte

  • ML stack is built using Python & PyTorch

  • Cloud automation is built using Terraform

  • Data is primarily stored in PostgreSQL

  • The search engine is powered by OpenSearch

  • Services run on Google’s Cloud Platform

Responsibilities

  • Design and train production-grade NLP models

  • Build complete data processing systems that drive products, systems or applications 

  • Lead experimentation processes that accelerate prototyping and maximize resource utilization

  • Process data pipelines for machine learning operations: scheduling, ETL, dataflow programming, SQL, data labeling, representation learning, hyperparameter tuning, and model management

  • Produce and deploy internal and external APIs

  • Design and implement predictive models on multiple decision platforms

  • Apply the latest techniques of academic research to real-world problems in the production environment

  • Review code, mentor other engineers and support the data science and engineering teams

  • Attract, recruit and retain top data science and engineering talent

Minimum Qualifications

  • Expertise in Natural Language Processing and Understanding (NLP & NLU)

  • Expertise in microservices and cloud computing—in at least one cloud platform

  • Familiar with distributed systems and the orchestration of large numbers of independent commodity machines into complete, functional systems to handle diverse workloads

  • Expertise in performing data science research

  • Expertise in writing world-class Python code

Preferred Qualifications

  • PhD in computer science, artificial intelligence, machine learning or a related technical field

  • 10+ years of professional data science and software engineering experience

  • Advanced working knowledge of information retrieval and search technologies and have set up and used open-source search systems to query and understand data

  • Expertise with Go

  • Experience with many of the following technologies:

  • Modern ML Models (e.g. BERT)

  • ElasticSearch, Solr and equivalent 

  • Kubernetes, Docker, Terraform

  • Machine learning infrastructure

  • Deep learning, GNNs

  • CircleCI, GitHub Actions, Jenkins or equivalent

  • Graph databases

What’s in the Offer

You have the opportunity to join an early stage startup and have significant ownership of technology development. You will work at the highest level and collaborate with world-class colleagues, advisors and technical experts. 

  • Competitive salary & equity packages

  • Unlimited vacation and sick leave

  • Strong remote work culture and esprit de corps

Diversity Commitment: We are focused on building a diverse and inclusive team. We welcome people of all backgrounds, experiences, abilities, and perspectives and are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Learn more at archipelo.com.

Related Jobs

Senior QA Specialist

We are looking for a Senior QA Specialist to join our Client's growing team. If you have a...