Data Analyst Interview Question
Interview Questions and Answers
1. What does a data analyst do?
A data analyst collects and studies data to find useful patterns and insights that help businesses make better decisions. They use tools like SQL, Excel, and dashboards to present findings clearly.
2. Why is EDA important?
Exploratory Data Analysis helps you understand the dataset before making decisions. It allows you to detect errors, understand trends, and ensure your analysis is based on reliable data.
3. What is data wrangling?
It is the process of transforming raw and unorganized data into a structured format so it can be easily analyzed.
4. Steps in an analytics project
- Understand the business problem
- Gather relevant data
- Clean and explore the data
- Analyze and visualize
- Share insights and recommendations
5. Common challenges in data analysis
- Poor quality data
- Confusing requirements
- Incorrect joins
- Data access limitations
- Changing business needs
6. WHERE vs HAVING
WHERE is used to filter records before grouping, while HAVING is applied after grouping to filter aggregated results.
7. INNER JOIN vs LEFT JOIN
INNER JOIN returns only matching records from both tables. LEFT JOIN returns all records from the left table and matching ones from the right.
8. Can alias be used in WHERE?
No, because WHERE runs before SELECT, so the alias is not yet created.
9. UNION vs INTERSECT vs EXCEPT
- UNION combines datasets
- INTERSECT returns common data
- EXCEPT shows differences
10. Correlated vs Non-correlated subquery
A non-correlated query runs independently, while a correlated query depends on the outer query and runs repeatedly.
11. ROW_NUMBER vs RANK vs DENSE_RANK
ROW_NUMBER gives unique numbers, RANK skips numbers when tied, and DENSE_RANK does not skip numbers.
12. What is CTE?
A CTE is a temporary result set used to simplify complex SQL queries and improve readability.
13. Window function
It performs calculations across rows without grouping them, like ranking or running totals.
14. Stored procedure
A stored procedure is a pre-written SQL program stored in the database that can be executed multiple times.
15. Nth highest value
You can use ranking functions like DENSE_RANK to identify the nth highest value.
16. Normal distribution
It is a symmetric distribution where most values are centered around the mean.
17. Correlation vs causation
Correlation shows a relationship, but causation proves that one variable directly affects another.
18. Type I vs Type II error
Type I error means detecting something that is not true. Type II error means missing something that actually exists.
19. Variance vs Covariance
Variance measures spread of a single variable, while covariance measures how two variables change together.
20. Overfitting vs Underfitting
Overfitting happens when the model memorizes data, while underfitting happens when it fails to capture patterns.
21 . What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?
INNER JOIN returns matching records from both tables. LEFT JOIN returns all records from the left table and matching rows from the right table, while RIGHT JOIN returns all rows from the right table and matching rows from the left table.
22.How do you find duplicate records in a table?
We use GROUP BY with HAVING COUNT(*) > 1 to identify duplicate values.
Example:
SELECT name, COUNT(*) FROM employee GROUP BY name HAVING COUNT(*) > 1;
23. Write a SQL query to find the second highest salary in a table.
SELECT MAX(salary) FROM employee WHERE salary < (SELECT MAX(salary) FROM employee);
24 . Explain the difference between WHERE and HAVING.
WHERE filters rows before grouping is done, while HAVING filters grouped data after using GROUP BY.
25 . What are primary keys and foreign keys?
A Primary Key uniquely identifies each record in a table and cannot contain NULL values. A Foreign Key creates a relationship between two tables by referring to the primary key of another table.
26 . What are Pivot Tables, and how do you use them?
Pivot Tables are used to summarize and analyze large amounts of data quickly. We create them by selecting data and going to Insert → Pivot Table to arrange fields into rows, columns, and values.
27 . Explain VLOOKUP and INDEX-MATCH.
VLOOKUP searches for a value in the first column and returns matching data from another column. INDEX-MATCH is a more flexible lookup method that works faster and can search in any direction.
28 . How do you handle missing data in Excel?
Missing data can be handled by removing blank rows, replacing blanks with default values, or using functions like IFERROR and filters to identify missing entries.
29 . What are Conditional Formatting and its use cases?
Conditional Formatting automatically changes cell formatting based on conditions. It is used to highlight duplicates, low/high values, deadlines, or trends using colors and icons.
30 . How do you create charts for data visualization in Excel?
Select the data range, go to Insert → Charts, and choose a chart type like bar, line, or pie chart. Charts help present data visually for better understanding and analysis.
31 . What is the difference between mean, median, and mode?
Mean is the average of all values, median is the middle value in sorted data, and mode is the value that appears most frequently in a dataset.
32 . Explain standard deviation and variance.
Variance measures how far data values are spread from the mean, while standard deviation is the square root of variance and shows data spread in an easier-to-understand form.
33 . What is correlation vs causation?
Correlation means two variables are related or move together, while causation means one variable directly causes changes in another variable.
34 . What is a p-value in hypothesis testing?
A p-value shows the probability that the result happened by chance. A smaller p-value (usually less than 0.05) indicates stronger evidence against the null hypothesis.
35 . What is the difference between population and sample data?
Looking to start your career in data analytics?Â
Join the best google data analytics institute and learn everything from basics to advanced concepts with real-time projects.
Get hands-on training, expert mentorship, and placement support to kickstart your career in data analytics.
Enroll now and take the first step toward becoming a Data Analyst!





