icon Register for Oracle Exadata Live Session on 05 May at 7.30 PM IST ENROLL NOW

Data Analyst Interview Question

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
interview questions and answers
  • 15 Apr, 2026
  • 0 Comments
  • 5 Mins Read

Data Analyst Interview Question

Interview Questions and Answers

1. What does a data analyst do?

A data analyst collects and studies data to find useful patterns and insights that help businesses make better decisions. They use tools like SQL, Excel, and dashboards to present findings clearly.

2. Why is EDA important?

Exploratory Data Analysis helps you understand the dataset before making decisions. It allows you to detect errors, understand trends, and ensure your analysis is based on reliable data.

3. What is data wrangling?

It is the process of transforming raw and unorganized data into a structured format so it can be easily analyzed.

4. Steps in an analytics project

  • Understand the business problem
  • Gather relevant data
  • Clean and explore the data
  • Analyze and visualize
  • Share insights and recommendations

5. Common challenges in data analysis

  • Poor quality data
  • Confusing requirements
  • Incorrect joins
  • Data access limitations
  • Changing business needs

6. WHERE vs HAVING

WHERE is used to filter records before grouping, while HAVING is applied after grouping to filter aggregated results.

7. INNER JOIN vs LEFT JOIN

INNER JOIN returns only matching records from both tables. LEFT JOIN returns all records from the left table and matching ones from the right.

8. Can alias be used in WHERE?

No, because WHERE runs before SELECT, so the alias is not yet created.

9. UNION vs INTERSECT vs EXCEPT

  • UNION combines datasets
  • INTERSECT returns common data
  • EXCEPT shows differences

10. Correlated vs Non-correlated subquery

A non-correlated query runs independently, while a correlated query depends on the outer query and runs repeatedly.

11. ROW_NUMBER vs RANK vs DENSE_RANK

ROW_NUMBER gives unique numbers, RANK skips numbers when tied, and DENSE_RANK does not skip numbers.

12. What is CTE?

A CTE is a temporary result set used to simplify complex SQL queries and improve readability.

13. Window function

It performs calculations across rows without grouping them, like ranking or running totals.

14. Stored procedure

A stored procedure is a pre-written SQL program stored in the database that can be executed multiple times.

15. Nth highest value

You can use ranking functions like DENSE_RANK to identify the nth highest value.

16. Normal distribution

It is a symmetric distribution where most values are centered around the mean.

17. Correlation vs causation

Correlation shows a relationship, but causation proves that one variable directly affects another.

18. Type I vs Type II error

Type I error means detecting something that is not true. Type II error means missing something that actually exists.

19. Variance vs Covariance

Variance measures spread of a single variable, while covariance measures how two variables change together.

20. Overfitting vs Underfitting

Overfitting happens when the model memorizes data, while underfitting happens when it fails to capture patterns.

21 . What is the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?

INNER JOIN returns matching records from both tables. LEFT JOIN returns all records from the left table and matching rows from the right table, while RIGHT JOIN returns all rows from the right table and matching rows from the left table.

22.How do you find duplicate records in a table?

We use GROUP BY with HAVING COUNT(*) > 1 to identify duplicate values.
Example:

SELECT name, COUNT(*)
FROM employee
GROUP BY name
HAVING COUNT(*) > 1;

23. Write a SQL query to find the second highest salary in a table.

SELECT MAX(salary)
FROM employee
WHERE salary < (SELECT MAX(salary) FROM employee);

24 . Explain the difference between WHERE and HAVING.

WHERE filters rows before grouping is done, while HAVING filters grouped data after using GROUP BY.

25 . What are primary keys and foreign keys?

A Primary Key uniquely identifies each record in a table and cannot contain NULL values. A Foreign Key creates a relationship between two tables by referring to the primary key of another table.

26 . What are Pivot Tables, and how do you use them?

Pivot Tables are used to summarize and analyze large amounts of data quickly. We create them by selecting data and going to Insert → Pivot Table to arrange fields into rows, columns, and values.

27 . Explain VLOOKUP and INDEX-MATCH.

VLOOKUP searches for a value in the first column and returns matching data from another column. INDEX-MATCH is a more flexible lookup method that works faster and can search in any direction.

28 . How do you handle missing data in Excel?

Missing data can be handled by removing blank rows, replacing blanks with default values, or using functions like IFERROR and filters to identify missing entries.

29 . What are Conditional Formatting and its use cases?

Conditional Formatting automatically changes cell formatting based on conditions. It is used to highlight duplicates, low/high values, deadlines, or trends using colors and icons.

30 . How do you create charts for data visualization in Excel?

Select the data range, go to Insert → Charts, and choose a chart type like bar, line, or pie chart. Charts help present data visually for better understanding and analysis.

31 . What is the difference between mean, median, and mode?

Mean is the average of all values, median is the middle value in sorted data, and mode is the value that appears most frequently in a dataset.

32 . Explain standard deviation and variance.

Variance measures how far data values are spread from the mean, while standard deviation is the square root of variance and shows data spread in an easier-to-understand form.

33 . What is correlation vs causation?

Correlation means two variables are related or move together, while causation means one variable directly causes changes in another variable.

34 . What is a p-value in hypothesis testing?

A p-value shows the probability that the result happened by chance. A smaller p-value (usually less than 0.05) indicates stronger evidence against the null hypothesis.

35 . What is the difference between population and sample data?

Population includes all data or individuals being studied, while a sample is a smaller subset taken from the population for analysis.

36 . What are the most common tools for data visualization?

Common data visualization tools include Microsoft Excel, Power BI, Tableau, Google Data Studio, and Python libraries like Matplotlib and Seaborn. These tools help present data in charts, graphs, and dashboards.

37 . Explain the difference between Tableau and Power BI.

Tableau is known for advanced visualization and better handling of large datasets, while Power BI is more user-friendly and integrates well with Microsoft products like Excel and Azure.

38 . What is a dashboard, and what makes it effective?

A dashboard is a visual report that displays important data and metrics in one place. An effective dashboard is simple, interactive, and shows clear insights for decision-making.

39 . How do you choose the right chart type for data?

Choose charts based on the data type and purpose. Bar charts compare values, line charts show trends over time, and pie charts display percentage distribution.

40 . What are KPIs (Key Performance Indicators)?

KPIs are measurable values used to track business performance and goals. Examples include sales growth, customer retention, and profit percentage.

Looking to start your career in data analytics? 
Join the best google data analytics institute and learn everything from basics to advanced concepts with real-time projects.

Get hands-on training, expert mentorship, and placement support to kickstart your career in data analytics.

Enroll now and take the first step toward becoming a Data Analyst!

lets talk - learnomate helpdesk

Book a Free Demo