Table of Contents

Interview questions

“Welcome to Continuum's open source repository for interview questions!”

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

“We typically expect job applicants to spend around 1-2 hours answering these questions as part of the initial application process (i.e. at the time you submit your resume).”

“You should view these questions as an opportunity to showcase your strengths, skill set, experiences, past successes (and failures) and anything else you think helps portray who you are and why you think you'd be a good fit within Continuum. We feel this is a better approach for assessing candidates than afforded by the traditional resume-submission process.”

“You can pick any questions you like. We anticipate the types of questions you choose to answer will be dependent upon the individual; are you a generalist or specialist? Low-level, bit-slinging C/C++ programmer, front-end engineer, quantitative developer, statistician, data scientist, or database architect?”

“If you don't feel like the existing questions allow you to best depict your strengths – you are more than welcome to write new questions. Extra points if you submit a pull request (against this GitHub repo) for the new questions.”

Software Engineering Questions

Bookshelf

“Send us a picture of your technical bookshelf. (We have a very liberal definition of what constitutes a “bookshelf”. Piles of books stacked on top of each other on the floor of your garage still counts.)”

Relational Databases

You're in charge of designing the table above (i.e. you issue the create table). What techniques would you use for storing such a large amount of data?

What are some options for bulk loading data? Why are these typically faster than just doing lots of inserts?

What are some ways you can do windowed result sets? Why do you need to do windowing? How do different vendors offer different approaches for doing windowing?

How does Oracle differ from other vendors with regards to automatic ID generation? What's are some of the advantages and disadvantages behind the way they do it?

Where would you expect to find b-trees behind the scenes? Why are b-trees used in these instances?

Insert row if primary key doesn't exist, update if it does – what options are available for achieving this? Do different vendors provide different approaches?

You have read-only access to a production datamart/warehouse. You have read/write access to a powerful dev box that also has the same vendor database available (which you have more control over – i.e. can create databases and administer locally). You want to get the data from prod to your dev box so you can slice and dice however you see fit (say, in preparation for a new data mining project). How do you do it? (You can list multiple things you'd try.)

You run explain plan on a query; the IO cost is 84819818, the CPU cost is 84800511 – how optimal is this query likely to be? What sort of query (or queries) do you think it might be?

You run explain plan on a query; the IO cost is 90204, the CPU cost is 981098091981 – how optimal is this query likely to be? What sort of query (or queries) do you think it might be?

Oracle PL/SQL: what's an autonomous transaction? When would you typically use one?

Oracle-specific Data compression – what are your options? What are some of the areas you'd typically encounter compression? What are the advantages and disadvantages?

When would you typically encounter histograms?

Things We Would Never Ask

A tongue-in-cheek collection of questions we'd never ask. Which probably means we've had them asked to us before in previous job interview situations. (What's being asked is not the issue – it's the paper/whiteboard/offline environment the interviewer wants the candidate to implement the solution in.)

Reverse this string. In C. On that whiteboard. Must compile.

Implement queue, given stack, in Python. In this shared Google Doc link we send you. Include doctests. All doctests must pass on first attempt. No you can't run it through Python locally first.

What's the big-oh of (quicksort | mergesort | bubblesort | timsort])? No you can't Google it.

License

Creative Commons License: Interview Questions by Continuum Analytics, Inc. is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/ContinuumIO/interview-questions.

Fair Use Source: https://github.com/ContinuumIO/interview-questions