The Distributed SQL Blog

Thoughts on distributed databases, open source and cloud native

My Experience as a YugaByte DB Engineering Intern

I recently finished my winter 2019 term as a software engineering intern at YugaByte and it was a fantastic experience! I found the opportunity through their posting on the University of Waterloo internal job board and it was also the first time I had heard of YugaByte DB. I had previously completed two other internships at much larger companies and I was a little skeptical of startups. The fact that a friend’s internship at a startup had ended halfway due to the startup folding did not help allay my concerns. But after doing some research on Crunchbase and other sources, I decided that YugaByte would be a good choice due to a very recent and successful funding round, as well as, having an actual product and the enterprise database space it targeted.

The Application and Interview Process

The application and interview process were very smooth and the YugaByte team was accommodating and prompt. I had two 45-minute interviews with Ram, a full-stack engineer, and Kannan, Co-Founder & CEO. It was the passion and expertise of those two that convinced me to take this opportunity over several other I had. As a Canadian, moving and working in Silicon Valley requires a lot of extra steps, foremost being the work visa. But again, YugaByte was swift and supportive with everything I needed, and multiple co-workers reached out to offer their assistance with my relocation.

Day One: Onboarding and Bug Fixing

Going into the first day of work, I had already tempered my expectations as a startup, with a disorganized development cycle having minimal testing, little documentation, and a buggy dev environment. I got none of those. Within the first few hours, I was able to get registered in the HR portal, as well as setup my VPN, IDEs with the company’s style guide, and all the build tools I needed. This streamlined experience was due to the extensive onboarding documentation and support from my coworkers.

After I got my workstation setup, Bogdan, my mentor, pulled me and another intern into a room and gave us a thorough breakdown of the stack, highlighting three core areas. The core distributed database written in C++, the language layer, also in C++, but with YACC and Lex as well, and finally the full-stack management platform which uses Java and Python. And then, to my surprise, we were asked what we wanted to work on. In my previous internships, I was just assigned to a project where the hiring team decided was the best fit for my skills. At that point, I wasn’t sure what I was interested in, so I was just given exploratory tasks in each of the areas so that I could gauge my enthusiasm respectively.

By the end of the day, I had built the database from source, and already started working on my first bug to get familiar with the codebase!

First Big Task: Blacklisting Nodes

My first substantial task was an issue with the node blacklisting feature where blacklisted nodes were not properly recognized by the load balancer, so they were given load anyway. As before, Bogdan explained the general context of the problem on a whiteboard. But, to my surprise, he left the solution entirely up to me. This is another difference from my previous experience, where I was usually told how to go about solving the problem. After about a week of work, I was ready to get my first code review. However, contrary to my assumptions, it wasn’t my mentor who reviewed my code but someone with whom I had only exchanged introductions. That further reinforced my admiration of how passionate and helpful this team was, where everyone was willing jump in even without being the official reviewers. They pointed out a lot of things I could improve with my code, but it was always constructive, and they were always willing to answer any and all questions I had. After passing through several rounds of review and 5 different test and build pipelines, I pushed my first major change to the master branch.

4 Months of Contributions

In my 4 months at YugaByte, I ended up working across the entire stack. On the language layer, I fixed and upgraded the timestamp datatype to be more consistent and accept significantly more formats, as well as adding the EXPLAIN PLAN command and feature to the YCQL API, so it works similar to most popular SQL databases. The feature would analyze a SELECT, INSERT, UPDATE, or DELETE statement and return details of the internal execution plan to the user to help them optimize their queries. Working in this area helped me better understand how all the different components of a modern compiler work, especially the semantic analysis. I also learned the intricacies of how databases are designed and all the internal mechanisms they use to optimize performance.

On the full-stack management platform, I pushed three big features. I added telemetry to the platform and an internal server to receive, process, and combine the metrics with the rest of our customer and marketing data. I automated the deployment of database clusters where before users had to ssh into each of the servers or Kubernetes pods, and manually update gflags and run custom scripts. I also added automatic deployment of TLS-enabled clusters and SSL certificate management where before users had to provide or generate their own root certificate and client certs and manually distribute and keep track of them across the nodes. This project significantly expanded my understanding of Kubernetes, networking, and security.

Finally, on the core database product, in addition to the node blacklisting issue, I worked on adding Change-Data Capture (CDC). Distributed systems are to me, the most complex. It took me a few weeks of reading documentation, tech blogs, and academic papers, including the Raft Consensus and Google Spanner papers, to get even a basic grasp of YugaByte DB’s innovative design. CDC is a feature that enables users to export changes to data in a database to external sinks. This exists in traditional databases like PostgreSQL and MySQL, but as a distributed database, this is a much more complex problem in YugaByte DB. I was given a blank slate and full end-to-end responsibility, so I spent the first few weeks researching and prototyping along with another co-worker working on two data center replication, which relies on CDC. The resulting design doc went through many review cycles for a few more weeks and before I finished my internship, I created a working prototype of a CDC service to export changes in the database to Elasticsearch. Through this project, I learned a lot about distributed systems and enjoyed it so much that I’m considering it for my future career path.

The Internship in Retrospective

While I gained considerable technical skills and knowledge, I learned much more than that. I had weekly meetings with my mentor Bogdan that were around 45 minutes to an hour of just walking around the block and chatting about anything. Our discussions included interesting fields in computer science, academic vs. industry career-paths, tech and living scene in different cities, and Game of Thrones to name a few. The YugaByte team has many more talented, experienced people and over the 4 months, I got to pick their brains about investing, entrepreneurship, working at tech-giants vs. startups, post-grad options, must-see experiences in California, and so much more.

Overall, the past 4 months have been some of the best I’ve ever experienced, primarily due to my amazing internship at YugaByte. All my negative expectations of a startup were shattered without losing any of the advantages. I could wear many hats and work across the stack, and given full responsibility for my work, which allowed me to greatly improve my technical skills. Working with the incredible team at the company also helped me develop in other areas and I’m a lot more confident in what I want my future to look like. I would unreservedly recommend YugaByte as a great environment for others and I would love to come back full-time after I graduate.

What’s Next?

  • Interested in interning at YugaByte? Drop us a line.
  • Compare YugaByte DB in depth to databases like CockroachDB, Google Cloud Spanner and MongoDB.
  • Get started with YugaByte DB on macOS, Linux, Docker, and Kubernetes.
  • Contact us to learn more about licensing, pricing or to schedule a technical overview.

Related Posts