Data Analyst Interview Question
Interview Questions and Answers
1. What does a data analyst do?
A data analyst collects and studies data to find useful patterns and insights that help businesses make better decisions. They use tools like SQL, Excel, and dashboards to present findings clearly.
2. Why is EDA important?
Exploratory Data Analysis helps you understand the dataset before making decisions. It allows you to detect errors, understand trends, and ensure your analysis is based on reliable data.
3. What is data wrangling?
It is the process of transforming raw and unorganized data into a structured format so it can be easily analyzed.
4. Steps in an analytics project
- Understand the business problem
- Gather relevant data
- Clean and explore the data
- Analyze and visualize
- Share insights and recommendations
5. Common challenges in data analysis
- Poor quality data
- Confusing requirements
- Incorrect joins
- Data access limitations
- Changing business needs
6. WHERE vs HAVING
WHERE is used to filter records before grouping, while HAVING is applied after grouping to filter aggregated results.
7. INNER JOIN vs LEFT JOIN
INNER JOIN returns only matching records from both tables. LEFT JOIN returns all records from the left table and matching ones from the right.
8. Can alias be used in WHERE?
No, because WHERE runs before SELECT, so the alias is not yet created.
9. UNION vs INTERSECT vs EXCEPT
- UNION combines datasets
- INTERSECT returns common data
- EXCEPT shows differences
10. Correlated vs Non-correlated subquery
A non-correlated query runs independently, while a correlated query depends on the outer query and runs repeatedly.
11. ROW_NUMBER vs RANK vs DENSE_RANK
ROW_NUMBER gives unique numbers, RANK skips numbers when tied, and DENSE_RANK does not skip numbers.
12. What is CTE?
A CTE is a temporary result set used to simplify complex SQL queries and improve readability.
13. Window function
It performs calculations across rows without grouping them, like ranking or running totals.
14. Stored procedure
A stored procedure is a pre-written SQL program stored in the database that can be executed multiple times.
15. Nth highest value
You can use ranking functions like DENSE_RANK to identify the nth highest value.
16. Normal distribution
It is a symmetric distribution where most values are centered around the mean.
17. Correlation vs causation
Correlation shows a relationship, but causation proves that one variable directly affects another.
18. Type I vs Type II error
Type I error means detecting something that is not true. Type II error means missing something that actually exists.
19. Variance vs Covariance
Variance measures spread of a single variable, while covariance measures how two variables change together.
20. Overfitting vs Underfitting
Overfitting happens when the model memorizes data, while underfitting happens when it fails to capture patterns.
ooking to start your career in data analytics?Â
Join the best google data analytics institute and learn everything from basics to advanced concepts with real-time projects.
Get hands-on training, expert mentorship, and placement support to kickstart your career in data analytics.
Enroll now and take the first step toward becoming a Data Analyst!





