Master in Data Analyst | Join Free Webinar on 15 Sep 2025 at 7 PM IST | Register for Free Demo

Data Analyst Interview Questions

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Data Analyst unterview Questions
  • User AvatarPradip
  • 18 Aug, 2025
  • 0 Comments
  • 48 Mins Read

Data Analyst Interview Questions

Data Analyst Interview Questions

EXCEL:

Q1. Differentiate between Power BI and Excel.

Answer:

  • Tabular Reports: Excel is better; Power BI struggles with purely tabular layouts.

  • Duplicate Tables: Excel supports; Power BI doesn’t.

  • Analytics: Excel supports advanced analytics (pivot tables, formulas), while Power BI focuses on simple analytics & KPIs.

  • Interactivity: Power BI allows cross-filtering, Excel does not.

  • Use Cases:

    • Excel – detailed calculations, financial models.

    • Power BI – dashboards, KPIs, interactive analytics.

Q2. Tell us about the Excel add-ins for Power BI.

Answer:
The Excel BI Toolkit includes:

  1. Power Query – For data connection & transformation.

  2. Power Pivot – For data modeling & calculations.

  3. Power View – For interactive visualization.

  4. Power Map – For 3D geographic analysis.

Q3. Why INDEX + MATCH is Better Than VLOOKUP

While VLOOKUP is many users’ first tool for looking up data, INDEX + MATCH is widely considered a superior, more flexible, and more powerful combination. Here’s a detailed breakdown of why.

1. Works Left of the Lookup Column (The Biggest Advantage)

  • VLOOKUP Limitation: The VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]) function can only look for values in the left-most column of the table_array and return a value from a column to the right. If your data is structured with the identifier (e.g., Employee ID) not in the first column, VLOOKUP fails.

  • INDEX + MATCH Solution: MATCH() finds the position (row number) of your lookup value anywhere in a column. INDEX() then returns the value from that same row number in any column you specify, regardless of whether it’s to the left or right. This gives you complete freedom in structuring your worksheets.

2. Better Performance on Large Datasets

  • VLOOKUP’s Inefficiency: VLOOKUP examines the entire table_array you provide. On a 100,000-row dataset where your table_array is columns A through Z, VLOOKUP is scanning all 100,000 rows and all 26 columns, even if you just want to return data from column C. This is computationally expensive.

  • INDEX + MATCH Efficiency: You define two precise ranges:

    • MATCH() only looks in the single, specific lookup column (e.g., A:A).

    • INDEX() only looks in the single, specific return column (e.g., C:C).
      This drastically reduces the number of cells Excel needs to process, leading to faster calculation times, especially in large workbooks.

3. Inserting or Deleting Columns is Safer

  • VLOOKUP’s Fragility: VLOOKUP uses a static column index number (e.g., 3 to return the 3rd column). If you insert or delete a column within your table_array, this number can break, and you might start returning data from the wrong column without immediately realizing it.

  • INDEX + MATCH’s Robustness: Since you directly reference the entire return column (e.g., C:C), inserting or deleting other columns elsewhere on the sheet does not affect the formula. It will always pull from column C.

4. Exact Match by Default

  • While both can do exact match (FALSE or 0), VLOOKUP’s fourth argument is [range_lookup], which defaults to TRUE (approximate match). This is a common source of errors for beginners who forget to set it to FALSE.

  • The MATCH() function’s third argument for exact match is 0, which has no default and must be explicitly stated, making the intent clearer.

Example Explained:
=INDEX(Salary, MATCH("John", Name, 0))

  • MATCH("John", Name, 0): This part is the “finder.”

    • It searches for the exact text “John” (the 0 signifies exact match) within the range named Name.

    • Let’s say “John” is in the 5th row of the Name range. The MATCH function returns the number 5.

  • INDEX(Salary, 5): This part is the “retriever.”

    • It goes to the range named Salary.

    • It finds the value in the 5th row of that range and returns it.

  • The result is John’s salary. The Name and Salary ranges can be anywhere on the sheet, in any order.


Q4. Create a Dynamic Dashboard

A dynamic dashboard in Excel is an interactive report that allows users to quickly visualize and analyze data by filtering and slicing it in real-time.

Core Components & Steps:

1. Foundation: Pivot Tables + Pivot Charts

  • Pivot Tables: Start by creating Pivot Tables from your raw data. They are the engine of your dashboard. They allow you to summarize millions of rows of data into meaningful categories (e.g., Sales by Region, Profit by Product Category).

  • Pivot Charts: Create charts (bar, line, pie, etc.) directly from your Pivot Tables. These visualizations become the core elements of your dashboard. Right-click a Pivot Table -> PivotChart to create one that is automatically linked.

2. Interactivity: Slicers & Timelines

  • Slicers: These are visual buttons that let you filter the data. Instead of digging into filter drop-downs, users can just click on “West” in a Region slicer to see data only for the West region.

    • To Add: Click inside your Pivot Table -> Go to the PivotTable Analyze tab -> Click Insert Slicer -> Choose the fields you want to filter by (e.g., Region, Product, Salesperson).

  • Timelines: A special type of slicer designed specifically for dates. It provides a graphical way to filter data by days, months, quarters, or years.

    • To Add: PivotTable Analyze tab -> Insert Timeline -> Select your date column.

3. Linking Slicers Across Multiple Pivot Tables/Charts (The Key to Dynamism)
This is what makes a dashboard truly cohesive. One click filters every single chart on the page.

  • Right-click on the slicer.

  • Select Report Connections (or Slicer Settings).

  • A dialog box will appear showing all the Pivot Tables in your workbook.

  • Check the boxes for every Pivot Table that you want that slicer to control.

  • Now, when you click an item in the slicer, all the connected Pivot Tables and their corresponding charts will update simultaneously.

Best Practices:

  • Layout: Arrange your charts, tables, and slicers on a single, clean worksheet.

  • Design: Use a consistent color scheme and clear titles. The Dashboard tab should be for presentation only; keep raw data and Pivot Tables on other hidden sheets.


Q5. Power Query in Excel (The Game Changer)

Power Query (known as Get & Transform Data in Excel 2016+) is a powerful data connection and transformation tool. It’s an ETL tool built right into Excel.

ETL: Extract, Transform, Load

1. Extract

  • This is the “E” in ETL. Power Query can connect to a vast array of data sources far beyond what standard Excel can import. This includes:

    • Files: Excel workbooks, CSV/Text, XML, JSON, PDF (preview).

    • Databases: SQL Server, Access, Oracle, MySQL.

    • Cloud & Other Sources: Azure, SharePoint, OData feeds, web pages (scraping data from HTML tables), and many more.

2. Transform

  • This is the “T” in ETL and Power Query’s core strength. This is where you clean, reshape, and prepare your data before it ever touches your worksheet. The original data source remains untouched. Common transformations include:

    • Cleaning: Removing duplicates, trimming spaces, changing data types (text to numbers/dates), replacing values.

    • Shaping: Filtering rows, removing unnecessary columns, splitting columns (e.g., “John Smith” into “First Name” and “Last Name”), merging columns.

    • Restructuring: Grouping data, pivoting (unpivoting is a superpower – turning cross-tab data into a clean table), adding custom columns with formulas (using the M language).

    • Combining: Appending multiple tables (stacking them on top of each other) and merging tables (like a VLOOKUP but between entire queries, much more powerful and robust).

3. Load

  • This is the “L” in ETL. Once you are satisfied with the transformed data, you load it into Excel. You have two main options:

    • Load to Worksheet: Creates a static table on a worksheet.

    • Load to Data Model: Loads the data into the in-memory Power Pivot engine, which is essential for handling very large datasets and creating complex data models with relationships between multiple tables.

Why It’s Revolutionary:

  • Reproducibility & Automation: The entire ETL process you create is saved as a “query.” When your source data is refreshed (e.g., you get a new CSV file every day), you simply right-click the query result in Excel and hit Refresh. Power Query will automatically run all the cleaning steps again on the new data, outputting a perfectly formatted and cleaned table in seconds. This eliminates hours of manual, error-prone work.

  • Data Integrity: Your raw data is never altered. All changes are stored as steps in the query.

Q6. Define Data Validation in Excel

Data Validation is a powerful Excel feature that allows you to control exactly what type of data can be entered into a cell or a range of cells. Its primary purpose is to ensure data integrity and accuracy by preventing users from entering invalid data.

Key Details:

  • How it Works: You set specific rules (criteria) for a cell. If a user tries to enter data that breaks these rules, Excel can reject the entry or show a warning message.

  • Common Validation Criteria:

    • Whole Number / Decimal: Restrict entry to numbers only (e.g., ages must be between 18 and 65).

    • List: Creates a dropdown list from a predefined set of options (e.g., Department: Sales, Marketing, Engineering).

    • Date / Time: Restrict entry to a valid date or time within a specific range.

    • Text Length: Limit the number of characters (e.g., a password must be at least 8 characters).

    • Custom: Use a formula to create complex rules (e.g., ensure a cell value is greater than the value in another cell).

  • Input and Error Messages: You can create a friendly “Input Message” (like a tooltip) that guides the user on what to enter. You can also customize the “Error Alert” that appears when invalid data is entered, making it more informative.

  • Location: Found on the Data tab, in the Data Tools group.

Why it’s important: It eliminates many common data entry errors, standardizes entries, and makes data easier to sort, filter, and analyze later.


Q7. How can you get the current date and time in Excel?

There are two primary methods to get the current date and time: using functions and using a keyboard shortcut.

Method 1: Using Functions (Volatile)

  • For Current Date: Use the TODAY() function.

    • Formula: =TODAY()

    • Returns the current date as per your system’s clock. It does not include time.

  • For Current Date AND Time: Use the NOW() function.

    • Formula: =NOW()

    • Returns the current date and time as per your system’s clock.

Important Note: Both TODAY() and NOW() are volatile functions. This means they recalculate every time the worksheet recalculates (e.g., when you open the file, edit a cell, or press F9). The value will not remain static unless you copy it and use Paste Special > Values.

Method 2: Using Keyboard Shortcuts (Static)

  • For Static Current Date: Press Ctrl + ; (semicolon).

  • For Static Current Time: Press Ctrl + Shift + : (colon).

  • For Static Current Date & Time: Press Ctrl + ;, then press Space, then press Ctrl + Shift + :.

Key Difference: The keyboard shortcuts enter the current date/time as a static value. It will not update automatically; it’s like typing the value in manually. This is useful for timestamps (e.g., recording when a specific entry was made).


Q8. How does the AND() function work in Excel?

The AND() function is a logical function used to test multiple conditions at the same time. It returns either TRUE or FALSE.

  • Syntax: =AND(logical1, [logical2], ...)

    • logical1logical2, etc., are the conditions you want to test. You can test up to 255 conditions.

  • How it Works: The AND function returns TRUE only if ALL the specified conditions are true. If any single condition is false, the entire function returns FALSE.

Example:
You have a student’s score in cell A2. You want to check if they passed, which requires a score above 50 AND below 100.
=AND(A2>50, A2<100)

  • If A2 is 75, the result is TRUE (both conditions are met).

  • If A2 is 120, the result is FALSE (the second condition fails).

  • If A2 is 40, the result is FALSE (the first condition fails).

It is extremely powerful when nested inside other functions like IF to create complex conditional formulas:
=IF(AND(A2>50, A2<100), "Pass", "Fail")


Q9. What is a Macro in Excel?

Macro is a piece of programming code, written in VBA (Visual Basic for Applications), that automates repetitive tasks in Excel. It’s essentially a recorded or written sequence of commands and actions that you can play back whenever you need to perform that task.

Key Details:

  • Recording: The easiest way to create a macro is to use the Macro Recorder (View tab > Macros > Record Macro). You perform the steps once manually, and Excel translates your actions into VBA code.

  • Editing: Recorded macros can be viewed and edited in the VBA Editor (opened by pressing Alt + F11). This allows for more complex and powerful automation.

  • Execution: You can run a macro from the Macros dialog box (View > Macros > View Macros), by assigning it to a button, or a keyboard shortcut (like Ctrl + Shift + C).

  • Use Cases:

    • Formatting large reports consistently.

    • Combining data from multiple sheets or workbooks.

    • Automating complex calculations and data manipulation.

    • Creating custom functions and user forms.

Security Note: Macros can contain harmful code. Excel has security settings to disable them by default. Only enable macros from trusted sources.


Q10. What is VLOOKUP in Excel?

VLOOKUP (Vertical Lookup) is one of Excel’s most popular and powerful lookup and reference functions. It searches for a specific value in the first column of a table and returns a corresponding value from a different column in the same row.

Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])

  • lookup_value: The value you want to search for (e.g., an Employee ID).

  • table_array: The range of cells that contains the data (the table). The first column of this range must contain the lookup_value.

  • col_index_num: The column number (within the table_array) from which to retrieve the return value. The first column is 1, the second is 2, etc.

  • [range_lookup]: Optional. Use FALSE for an exact match. Use TRUE (or omit) for an approximate match. You will almost always use FALSE.

Example:
You have a product table where Column A has Product ID and Column B has Price.
You want to find the price for product ID “P123”.
=VLOOKUP("P123", A2:B100, 2, FALSE)
This formula looks for “P123” in the range A2:A100 and returns the value from the 2nd column (column B) of the same row.

Limitation: VLOOKUP can only look to the right. It cannot return a value from a column to the left of the lookup column. For more flexibility, use the newer XLOOKUP or INDEX/MATCH combination.


Q11. How can you make a dropdown list in MS Excel?

You create a dropdown list using the Data Validation feature.

Steps:

  1. Prepare your list: Type the options for your dropdown in a single column or row on your worksheet (e.g., in cells Z1:Z5). Alternatively, you can type them directly in the Data Validation box.

  2. Select the cell: Click on the cell where you want the dropdown list to appear.

  3. Open Data Validation: Go to the Data tab > Data Tools group > Click Data Validation.

  4. Set the Criteria:

    • In the Settings tab, under Allow:, select List.

    • In the Source: box, either:

      • Click the range selector icon and select the cells containing your list (e.g., $Z$1:$Z$5).

      • Type the list items directly, separated by commas (e.g., Yes, No, Maybe).

  5. (Optional) Configure the Input Message and Error Alert tabs to guide the user.

  6. Click OK.

The selected cell will now have a dropdown arrow. Clicking it will show the list of options you defined, and the user can only choose from those options.


Q12. How are duplicate entries found in Excel?

There are several methods to find and manage duplicates.

Method 1: Conditional Formatting (For Visual Identification)

This is the quickest way to highlight duplicates.

  1. Select the range of cells you want to check.

  2. Go to the Home tab > Styles group > Conditional Formatting.

  3. Select Highlight Cells Rules > Duplicate Values.

  4. Choose a formatting style (e.g., light red fill) and click OK.
    All duplicate values in the selected range will be instantly highlighted.

Method 2: The Remove Duplicates Tool (For Deletion)

This tool permanently deletes duplicate rows.

  1. Select any cell within your data range.

  2. Go to the Data tab > Data Tools group > Remove Duplicates.

  3. A dialog box appears. If your data has headers, check the “My data has headers” box.

  4. Select the columns you want to check for duplicates. If you select all columns, a duplicate is only found if every value in the row is identical to another row.

  5. Click OK. Excel will remove the duplicate rows and show a message stating how many duplicates were removed and how many unique values remain.

Method 3: Using a Formula (For Flagging or Counting)

You can use a formula to identify if a value is a duplicate.

  • To Flag a Duplicate: The COUNTIF function is common for this.

    • =COUNTIF($A$2:$A$10, A2)>1

    • This formula, entered in cell B2 and copied down, checks how many times the value in A2 appears in the range A2:A10. If the count is greater than 1, it returns TRUE (meaning it’s a duplicate).

  • To List Unique Values: In newer versions of Excel (Office 365), you can use the UNIQUE() function to extract a list of all distinct values from a range.

    • =UNIQUE(A2:A100)

Q13. What is a waterfall chart and when do we use it?

A waterfall chart shows how an initial value is incrementally increased and/or decreased by a series of positive and negative contributions, resulting in a final value. Each “step” is a column that starts where the previous one ended, so you can see the cumulative effect of components.

When to use

Explaining bridges between two numbers: Revenue → Gross Profit → Operating Profit → Net Income.

Budget/forecast walks: Starting balance → adds → subtracts → ending balance.

Variance analysis: Plan vs. Actual with reasons (price, volume, mix).

How to build in Excel (Office 2016+)

Arrange data in two columns (Category, Amount) in logical order. Include starting and ending totals.

Insert → Charts → Waterfall.

Right-click the starting and ending columns → Set as Total (so they “sit on the baseline”).

Format data series for readability (data labels, colors for increases/decreases, sort order if needed).

Older Excel (pre-2016)

Use a stacked column workaround with helper columns: Base, Increase, Decrease, Totals, and set Base to be invisible.

Interview tip
Emphasize that waterfall charts are ideal for storytelling with numbers—stakeholders can instantly see what drove the change and by how much.

Q14. How can you highlight cells with negative values in Excel?

Fastest method (built-in rule)

Select the range.

Home → Conditional Formatting → Highlight Cells Rules → Less Than…

Enter 0, choose a format (e.g., Light Red Fill), OK.

Formula-based rule (to highlight entire rows)

Select all rows of interest, e.g., A2:F500.

Conditional Formatting → New Rule → Use a formula to determine which cells to format.

Formula (assuming the value to test is in column D):
=$D2<0

Set the format (fill/bold), OK.

Alternative (number formatting only)

Format Cells → Number → Custom → 0;[Red]-0;0
This shows negative numbers in red, but doesn’t highlight the cell—and won’t trigger filters.

Pitfalls

If cells contain text like “–” or “-5” stored as text, the rule Less Than 0 won’t catch them—convert to numbers first.

Q15. How can you clear all formatting without removing contents?

Clear formats

Select the range (or press Ctrl+A twice to select the entire sheet’s used range).

Home → Clear (eraser icon) → Clear Formats.

Other useful clears

Clear All: removes contents + formats + comments (be careful).

Clear Contents: keeps formats.

Paste Special → Values: replaces formulas with values but keeps formats (or combine with Clear Formats afterwards).

Shortcuts

Alt → H → E → F (Windows) opens Clear Formats via key tips.

Q16. What is a Pivot Table, and what are its sections?

A Pivot Table summarizes large datasets dynamically—drag fields to quickly aggregate and slice data without writing formulas.

Sections

Filters: Top-level slicer for the entire pivot (e.g., Year).

Columns: Fields that define columns (e.g., Quarter).

Rows: Fields that define rows (e.g., Product).

Values: Aggregations (Sum, Count, Avg, Distinct Count if Data Model is used).

Key features to mention

Grouping (dates into months/quarters/years, numbers into bins).

Calculated Fields/Items (classic pivots) and Measures (Power Pivot / Data Model).

Slicers & Timelines for interactive filtering.

Show Values As (e.g., % of Row Total, Running Total, Difference From).

Q17. Can you make a Pivot Table from multiple tables?

Yes—if there’s a relationship between the tables. You do this via the Data Model (Power Pivot under the hood).

Steps

Load each table to Excel as an Excel Table (Ctrl+T) with clean headers and a unique key in the lookup (dimension) tables.

Insert → PivotTable → From Table/Range → check Add this data to the Data Model.

Repeat for other tables or use Data → Get Data (Power Query) → load to Data Model.

Data → Relationships (or Power Pivot → Diagram View) to define one-to-many relationships (e.g., Sales[ProductID] → Products[ProductID]).

Insert the Pivot from the Data Model and use fields across tables.

Best practice

Use a star schema (fact table + dimensions).

Ensure lookup keys are unique on the “one” side.

Q18. How can we select all blank cells in Excel?

Go To Special

Select the dataset.

Press F5 (Go To) → Special… → choose Blanks → OK.

Excel selects only truly empty cells (not zeros, not formulas returning “”).

If cells contain formulas that return empty text “”

Use a helper column with =LEN(A2)=0 or =A2=”” and filter TRUEs; or

Find & Select → Replace might not help; consider converting “” to actual blanks via formula or Power Query.

Common use

After selecting blanks, type a value (e.g., “N/A”) and press Ctrl+Enter to fill all selected blanks at once.

Q19. What are the most common questions to ask before creating a dashboard?

Who is the audience and what decisions will this drive? (Executive vs. operational users.)

Top 3–5 KPIs and their precise definitions. (Calculation rules, time windows, filters.)

Data sources & ownership. (Systems, refresh method, access/credentials.)

Update frequency & latency tolerance. (Real-time, hourly, daily, weekly.)

Granularity needed. (Daily vs. monthly; transaction vs. summary.)

Drill-down paths & filters. (By region, product, segment.)

Layout constraints. (Screen size, printing, color-blind friendly palette.)

Version/Tooling. (Excel version, Power Query/Power Pivot availability, add-ins.)

Data quality & caveats to display. (Known gaps, exclusions.)

Security. (Row-level restrictions, masking sensitive fields.)

Interview bonus
Explain you’ll prototype quickly, test with stakeholders, then iterate based on feedback.

Q20. What is a Print Area and how can you set it?

Print Area is the range that Excel prints by default for that sheet.

Set

Select the range to print.

Page Layout → Print Area → Set Print Area.

Add to an existing print area

Page Layout → Print Area → Add to Print Area (creates discontiguous areas, each prints on a new page).

Clear

Page Layout → Print Area → Clear Print Area.

Related settings

Print Titles (repeat header rows/columns on each page).

Page Break Preview to adjust page fits.

Fit to One Page: Page Layout → Size/Scaling (careful with unreadable fonts).

Q21. What steps can you take to handle slow Excel workbooks?

Quick wins

Manual calculation when modeling: Formulas → Calculation Options → Manual; press F9 to calculate.

Save as .XLSB (binary) to reduce file size and open/save time.

Minimize volatile functions: OFFSET, INDIRECT, TODAY, NOW, RAND, RANDBETWEEN. Replace with non-volatile alternatives (INDEX instead of OFFSET, helper columns, static dates).

Avoid array-heavy formulas across whole columns. Limit ranges (e.g., A2:A50000 instead of A:A in SUMPRODUCT).

Replace complex formulas with helper columns (precompute once, aggregate later).

Convert heavy formulas to values where calculations are final (Paste Special → Values).

Use Excel Tables & Structured References—they auto-size and are efficient with XLOOKUP, SUMIFS, COUNTIFS.

Reduce conditional formatting rules and ranges; consolidate where possible.

Limit cross-workbook links and 3-D references.

Pictures, shapes, and styles: remove unneeded objects; clean duplicate styles.

Pivot caches: reuse the same cache for similar pivots; disable “Defer Layout Update” while designing.

Power Query / Power Pivot tips

Push heavy transformations into Power Query (query folding to the source where possible).

Use the Data Model (xVelocity) for large analytics instead of worksheet formulas—measures are faster and memory-efficient.

Hardware & options

64-bit Excel for very large models.

Increase RAM; enable multi-threaded calculation (File → Options → Advanced → Formulas).

Diagnostics

Formulas → Evaluate Formula to identify slow constructs.

Check Workbook Statistics; and use Inquire add-in (if available) to spot bloats.

Q22. Can you sort multiple columns at one time?

Yes—use multi-level sorting via the Sort dialog.

Steps

Select your full dataset (include headers).

Data → Sort.

Sort by: choose primary column (e.g., Region), order (A→Z).

Add Level: choose secondary column (e.g., Customer), order (A→Z).

Add Level again for tertiary sort (e.g., Order Date, Oldest to Newest).

Use Options… to enable case-sensitive sort or set Sort left to right for row-wise sorting.

Notes

Excel sorts are not stable by default—so always define all necessary levels.

Use Custom Lists for business-specific orders (e.g., High, Medium, Low or days of week).

Tiny practice dataset you can mention in interviews

Imagine a small Income Walk:

Starting Revenue: 1,000

COGS: −400

Marketing: −120

R&D: −80

Other Income: +30

Taxes: −90

Net Income: 340

This maps perfectly to a waterfall chart and shows how you’d narrate it.

If you want, I can spin up a tiny sample Excel file for you with:

A dataset,

A pivot built from two related tables,

A waterfall chart,

And a few conditional formatting rules—so you can practice and demo.

 

SQL:

1. What is the difference between INNER JOIN and LEFT JOIN in SQL?

INNER JOIN

Returns only rows that have matching values in both tables.

If a record in the left table does not have a match in the right table, it will not appear.

Best used when you need only the intersection of data.

Example: Get only customers who placed orders.

SELECT A.customer_id, B.order_id
FROM Customers A
INNER JOIN Orders B
ON A.customer_id = B.customer_id;

LEFT JOIN

Returns all rows from the left table, plus the matching rows from the right.

If no match is found, NULLs are returned for right-side columns.

Best used when you don’t want to lose unmatched data from the left table.

Example: Get all customers, even if they didn’t order.

SELECT A.customer_id, B.order_id
FROM Customers A
LEFT JOIN Orders B
ON A.customer_id = B.customer_id;
2. How do you handle missing data in a dataset?

Options:

Remove missing values: Use dropna() in Python Pandas. Suitable when missing data is small (<5%).

Imputation: Replace with mean, median, or mode. Works well for numerical/categorical features.

Predictive imputation: Use regression, KNN imputer, or ML models to estimate missing values.

Domain-specific handling: For time series, forward/backward fill may work.

Decision Rule:

If <5% missing → drop.

If important feature → impute.

If large portion missing → consider removing feature.

3. What are the key steps in a data analysis process?
  • Define the problem – What business question are we answering?
  • Collect data – From databases, APIs, Excel, or surveys.
  • Clean & preprocess data – Handle missing values, duplicates, outliers.
  • Analyze data – Using statistics, SQL, Python (Pandas, NumPy).
  • Visualize results – Power BI, Tableau, Matplotlib, Seaborn.
  • Interpret insights – Provide actionable recommendations for business.
4. Difference between OLAP and OLTP

Feature OLTP (Online Transaction Processing) OLAP (Online Analytical Processing)
Purpose Daily operations (insert/update/delete) Business intelligence, reporting
Data Structure Highly normalized Denormalized
Speed Fast writes Fast reads
Example ATM, Banking apps, e-commerce Sales dashboards, Data Warehouses

5. Explain the use of CALCULATE() in Power BI

Purpose: Changes the filter context of a measure.

Example: Calculate sales only for 2023.

Total2023Sales = CALCULATE(SUM(Sales[Amount]), Sales[Year] = 2023)

Useful for:

Year-to-date, quarter-to-date calculations.

Applying multiple conditions dynamically.

6. How do you identify outliers in data?

Statistical methods:

IQR Method: Outliers lie outside [Q1 – 1.5*IQR, Q3 + 1.5*IQR].

Z-score: Points with |z| > 3 are extreme.

Visualization:

Boxplot → shows whiskers and outliers.

Scatter plot → reveals unusual points.

Why important?
Outliers can skew results, so we either remove or transform them.

7. What is normalization in SQL databases?

Definition: Organizing data into tables to remove redundancy and dependency.

Forms:

1NF: Atomic (no repeating groups, no multiple values in a cell).

2NF: No partial dependency (every non-key depends on full primary key).

3NF: No transitive dependency (non-key fields depend only on the primary key).

Benefit: Reduces redundancy, improves data integrity.

8. Difference between WHERE and HAVING in SQL

WHERE: Filters rows before aggregation.

HAVING: Filters after aggregation.

-- Example
SELECT dept, COUNT(*) 
FROM Employees 
GROUP BY dept 
HAVING COUNT(*) > 5; -- Only departments with more than 5 employees
9. What are measures and dimensions in Power BI?

Dimension: Descriptive attributes (Product, Region, Customer).

Measure: Numeric calculation (Sales, Revenue, Profit).

Example:

Dimension → “Region = North”

Measure → “Total Sales = $1M”

10. Scenario: Sales dropped in a region, how will you analyze it?

Compare Year-on-Year (YoY) or Month-on-Month (MoM) trends.

Filter by region, analyze product mix & quantities sold.

Check customer churn, pricing strategy, promotions.

Investigate external factors: competition, supply chain issues, seasonal effects.

Present insights with visuals → e.g., line charts for trend, pie charts for product mix.

11. SQL query to get the second highest salary
SELECT MAX(salary) AS Second_Highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Alternative (using RANK):

SELECT salary
FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = 2;
12. How do you optimize a large dataset in Power BI?

Remove unused columns.

Use Star Schema (fact + dimension).

Prefer DAX over M queries for calculations.

Pre-aggregate at SQL/query level.

Disable auto-date/time option.

Use incremental refresh for large datasets.

13. What is data wrangling?

Process of cleaning, transforming, and mapping raw data into a usable format.

Includes:

Handling missing values.

Removing duplicates.

Normalizing formats (date, currency).

Creating new derived columns.

Tools: Power Query, Python (Pandas), SQL.

14. What is the use of GROUP BY in SQL?

Used to group rows based on a column and apply aggregation.

Example:

SELECT department, COUNT(*) 
FROM employees 
GROUP BY department;
15. Explain a project where you used data to drive business decisions.

(Customize with your real experience, here’s a model answer)
I built an e-commerce sales dashboard using Power BI. I analyzed top-selling products, profit margins, and customer segments. I noticed high returns from one region. After root cause analysis, we found a logistics issue. Fixing it reduced return rates by 18% and improved customer satisfaction.

16. SQL query – Running total of sales by product category
SELECT
sale_id,
product_category,
sale_date,
sale_amount,
SUM(sale_amount) OVER (
PARTITION BY product_category
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM SalesData;

Use Case: Show cumulative sales trend over time per category.

17. INNER JOIN vs LEFT JOIN vs RIGHT JOIN vs FULL OUTER JOIN

INNER JOIN → Only matches (customers with orders).

LEFT JOIN → All left + matches from right (all customers, even without orders).

RIGHT JOIN → All right + matches from left (all employees, even without tasks).

FULL OUTER JOIN → Everything (all customers + suppliers, matched and unmatched).

Scenario table:

Join Type Use Case
INNER JOIN Customers who placed orders
LEFT JOIN All customers (even if no orders)
RIGHT JOIN All employees (even if no tasks)
FULL OUTER JOIN Combine all customers & suppliers

18. When would you use a CTE instead of subquery/temp table?

CTE (Common Table Expression): Temporary result set defined using WITH.

When to use:

Improve readability (multi-step queries).

Recursive queries (org hierarchy, tree structure).

Avoid repeating same subquery logic.

When temp tables are overkill.

Example:

WITH FilteredSales AS (
SELECT product_id, SUM(sale_amount) AS total_sales
FROM Sales
GROUP BY product_id
)
SELECT product_id
FROM FilteredSales
WHERE total_sales > 1000;

Recursive Example: Employee hierarchy.

19. SQL query to pivot rows → columns (without PIVOT)
SELECT
Region,
MAX(CASE WHEN Quarter = 'Q1' THEN Revenue END) AS Q1_Revenue,
MAX(CASE WHEN Quarter = 'Q2' THEN Revenue END) AS Q2_Revenue
FROM Sales
GROUP BY Region;

Why not PIVOT?

Not supported in MySQL/SQLite.

CASE WHEN is universal + flexible.

20. Monthly Revenue per Product Category

Task: Calculate the total revenue for each product category on a monthly basis.
Answer:

sql
SELECT
    TO_CHAR(s.sale_date, 'YYYY-MM') AS sales_month, -- Use FORMAT() for SQL Server, strftime() for SQLite
    p.product_category,
    SUM(s.revenue_usd) AS total_revenue
FROM
    Sales s
JOIN
    Products p ON s.product_id = p.product_id
GROUP BY
    sales_month, p.product_category
ORDER BY
    sales_month, p.product_category;

Explanation: This query joins the Sales and Products tables to link each sale to its product category. It then groups the results by the year-month of the sale date and the product category, summing up the revenue for each group. The ORDER BY clause ensures the output is sorted chronologically and by category.


21. Top 3 Most Profitable Products in Each Category

Task: For each category, find the top 3 products by total revenue, including ties.
Answer:

sql
WITH ProductRevenue AS (
    SELECT
        p.product_category,
        p.product_name,
        SUM(s.revenue_usd) AS total_revenue
    FROM
        Sales s
    JOIN
        Products p ON s.product_id = p.product_id
    GROUP BY
        p.product_category, p.product_name
)
SELECT
    product_category,
    product_name,
    total_revenue,
    DENSE_RANK() OVER (
        PARTITION BY product_category
        ORDER BY total_revenue DESC
    ) AS rank_in_category
FROM
    ProductRevenue
QUALIFY
    DENSE_RANK() OVER (PARTITION BY product_category ORDER BY total_revenue DESC) <= 3;

Explanation:

  1. CTE (Common Table Expression): The ProductRevenue CTE calculates the total revenue for each product.

  2. Main Query: The main query uses the DENSE_RANK() window function to rank products within their category based on total revenue (descending). DENSE_RANK() is used to handle ties; if two products have the same revenue, they get the same rank, and the next rank is not skipped.

  3. QUALIFY Clause: This filters the results to only include rows where the rank is 3 or lower. (Note: QUALIFY is used in databases like Snowflake and BigQuery. In SQL Server/PostgreSQL, you would use a subquery or another CTE to filter the ranked results).


22. Average Downtime by Maintenance Type and Asset Over Last Year

Task: Calculate the average downtime for each maintenance type and asset, only for assets with at least 5 logs in the last year.
Answer:

sql
SELECT
    asset_id,
    maintenance_type,
    AVG(downtime_hours) AS average_downtime_hours
FROM
    Maintenance_Logs
WHERE
    log_date >= CURRENT_DATE - INTERVAL '1 year' -- Adjust for your SQL dialect
GROUP BY
    asset_id, maintenance_type
HAVING
    COUNT(log_id) >= 5
ORDER BY
    asset_id, maintenance_type;

Explanation: The WHERE clause filters logs from the last 365 days. The GROUP BY calculates the average downtime for each asset and maintenance type combination. The HAVING clause is crucial as it filters these groups to only include those with a count of 5 or more records, ensuring the average is statistically significant.


23. Products with No Sales in the Last 6 Months

Task: List all products that have had zero sales in the last 6 months.
Answer:

sql
SELECT
    p.product_name,
    p.product_category
FROM
    Products p
WHERE
    p.product_id NOT IN (
        SELECT DISTINCT s.product_id
        FROM Sales s
        WHERE s.sale_date >= CURRENT_DATE - INTERVAL '6 months'
    );

Explanation: This query uses a subquery to find all distinct product_id values that have sales in the last 6 months. The main query then selects all products whose product_id is not in that list, effectively identifying the stagnant products.


24. Cumulative Revenue by Department Over Time

Task: Calculate the running total of revenue for each department over time.
Answer:

sql
SELECT
    s.sale_date,
    CASE
        WHEN p.product_category LIKE '%Automation%' OR p.product_category LIKE '%Software%' THEN 'Digital Industries'
        WHEN p.product_category LIKE '%Medical%' THEN 'Healthineers'
        WHEN p.product_category LIKE '%Rail%' THEN 'Mobility'
        ELSE 'Other'
    END AS department,
    SUM(s.revenue_usd) AS daily_revenue,
    SUM(SUM(s.revenue_usd)) OVER (
        PARTITION BY
            CASE
                WHEN p.product_category LIKE '%Automation%' OR p.product_category LIKE '%Software%' THEN 'Digital Industries'
                WHEN p.product_category LIKE '%Medical%' THEN 'Healthineers'
                WHEN p.product_category LIKE '%Rail%' THEN 'Mobility'
                ELSE 'Other'
            END
        ORDER BY s.sale_date
    ) AS cumulative_revenue
FROM
    Sales s
JOIN
    Products p ON s.product_id = p.product_id
GROUP BY
    s.sale_date, department
ORDER BY
    department, s.sale_date;

Explanation:

  1. CASE Statement: Maps the product_category to a Siemens department (e.g., ‘Medical Imaging’ -> ‘Healthineers’).

  2. Daily Revenue: SUM(s.revenue_usd) grouped by sale_date and the derived department gives the daily revenue per department.

  3. Cumulative Revenue: The window function SUM(SUM(...)) OVER (PARTITION BY department ORDER BY sale_date) calculates a running total. It partitions the data by the derived department and orders it by date, so the sum accumulates over time within each department.


25. Employees Who Haven’t Logged Any Maintenance in the Last Quarter

Task: Find inactive employees in specific departments based on maintenance logs.
Answer:

sql
SELECT
    e.employee_name,
    e.department
FROM
    Employees e
WHERE
    e.hire_date < CURRENT_DATE - INTERVAL '3 months'
    AND e.department IN ('Mobility', 'Smart Infrastructure')
    AND e.employee_id NOT IN (
        SELECT DISTINCT m.technician_id
        FROM Maintenance_Logs m
        WHERE m.log_date >= CURRENT_DATE - INTERVAL '3 months'
    );

Explanation: The subquery finds all technician_id values that have logged maintenance in the last quarter. The main query selects employees from the relevant departments, hired before the last quarter, whose employee_id is not found in that subquery list.


26. Average Manufacturing Cost for Products with Revenue Above Average

Task: Find the avg. manufacturing cost for products whose total revenue is above the overall average product revenue.
Answer:

sql
WITH ProductTotals AS (
    SELECT
        product_id,
        SUM(revenue_usd) AS total_revenue
    FROM
        Sales
    GROUP BY
        product_id
)
SELECT
    AVG(p.manufacturing_cost_usd) AS average_cost_of_high_revenue_products
FROM
    Products p
JOIN
    ProductTotals pt ON p.product_id = pt.product_id
WHERE
    pt.total_revenue > (SELECT AVG(total_revenue) FROM ProductTotals);

Explanation:

  1. CTE (ProductTotals): Calculates the total revenue for each product.

  2. Subquery in WHERE: (SELECT AVG(total_revenue) FROM ProductTotals) calculates the average of all individual product revenues.

  3. Main Query: Joins Products with the CTE and filters for products where the total revenue is greater than the overall average calculated by the subquery. Finally, it takes the average of the manufacturing_cost_usd for these high-revenue products.


27. Percentage of Total Revenue by Customer Segment for Each Month

Task: For each month, show what percentage of total revenue each customer segment contributed.
Answer:

sql
SELECT
    TO_CHAR(s.sale_date, 'YYYY-MM') AS sales_month,
    s.customer_segment,
    SUM(s.revenue_usd) AS segment_revenue,
    (SUM(s.revenue_usd) * 100.0) / SUM(SUM(s.revenue_usd)) OVER (PARTITION BY TO_CHAR(s.sale_date, 'YYYY-MM')) AS segment_revenue_percentage
FROM
    Sales s
GROUP BY
    sales_month, s.customer_segment
ORDER BY
    sales_month, s.customer_segment;

Explanation: The key is the window function in the percentage calculation. SUM(SUM(s.revenue_usd)) OVER (PARTITION BY sales_month) calculates the total revenue for each month. Dividing each segment’s monthly revenue by this total and multiplying by 100 gives the percentage contribution of each segment for that month.


28. Identify Consecutive Days of Downtime for Assets

Task: Find assets that had downtime (>0 hours) for 3 or more consecutive days.
Answer:

sql
WITH DowntimeData AS (
    SELECT
        asset_id,
        log_date,
        downtime_hours,
        LAG(log_date, 1) OVER (PARTITION BY asset_id ORDER BY log_date) AS prev_date,
        LAG(log_date, 2) OVER (PARTITION BY asset_id ORDER BY log_date) AS prev_date_2
    FROM
        Maintenance_Logs
    WHERE
        downtime_hours > 0
)
SELECT DISTINCT
    asset_id,
    prev_date_2 AS consecutive_downtime_start_date
FROM
    DowntimeData
WHERE
    log_date - prev_date = 1
    AND prev_date - prev_date_2 = 1;

Explanation:

  1. CTE (DowntimeData): Uses the LAG() function to look back at the previous two log dates for the same asset, but only for records where there was actual downtime.

  2. Main Query: Filters the results to find rows where the current log date is exactly one day after the previous date (log_date - prev_date = 1), and the previous date is exactly one day after the one before that (prev_date - prev_date_2 = 1). This identifies a sequence of three consecutive days. The start of this sequence is prev_date_2.


29. Department with the Highest Average Salary in Last 5 Years

Task: Find the department with the highest average salary for employees hired in the last 5 years.
Answer:

sql
SELECT
    department,
    AVG(salary_usd) AS average_salary_last_5_years
FROM
    Employees
WHERE
    hire_date >= CURRENT_DATE - INTERVAL '5 years'
GROUP BY
    department
ORDER BY
    average_salary_last_5_years DESC
LIMIT 1; -- Use TOP 1 for SQL Server, FETCH FIRST 1 ROW ONLY for Oracle

Explanation: The query filters for recent hires, groups them by department, and calculates the average salary for each group. Ordering the results by this average in descending order and taking the first row gives the department with the highest average salary.

30. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

These functions are used to assign a sequential integer to rows within a partition of a result set. The difference lies in how they handle ties (rows with identical values in the ORDER BY clause).

Function Behavior with Ties Sequence After a Tie Example Output for ORDER BY Salary DESC
ROW_NUMBER() Ignores ties. Every row gets a unique number, even if the values are the same. The assignment is arbitrary for tied rows. Continuous, unique sequence. 1, 2, 3, 4, 5
RANK() Recognizes ties. Tied rows get the same rank. Skips subsequent ranks. The number of skipped ranks depends on the number of ties. 1, 2, 2, 4, 5
DENSE_RANK() Recognizes ties. Tied rows get the same rank. Does not skip ranks. The next rank after a tie is the next consecutive number. 1, 2, 2, 3, 4

Your Example Explained:

Given an employees table with these salaries: 100, 90, 90, 80, 70

Your SQL query would produce this result:

name salary rank dense_rank row_num
Alice 100 1 1 1
Bob 90 2 2 2
Charlie 90 2 2 3
David 80 4 3 4
Eve 70 5 4 5

Why this happens:

  • RANK(): Bob and Charlie are tied for 2nd place. Since two people are in 2nd place, RANK() skips the next rank (3) and places David at 4.

  • DENSE_RANK(): Bob and Charlie are tied for 2nd place. DENSE_RANK() then assigns the next consecutive number (3) to David.

  • ROW_NUMBER(): Simply assigns a unique number to each row based on its position in the ordered set. The order for the tied rows (Bob and Charlie) is non-deterministic unless specified further.

Key Takeaway:

  • Use ROW_NUMBER() to get a unique identifier for each row.

  • Use RANK() if you want to know “this person is in the Nth position,” including ties, and are okay with gaps.

  • Use DENSE_RANK() if you want to know “this person is in the Nth group or tier,” without any gaps in the ranking sequence.

POWER BI:

1. How would you optimize a complex Power BI report with multiple data sources and large datasets?

Goal: Improve report load time and ensure smooth user experience.

Best Practices:

Import Mode vs DirectQuery

Import → loads data into memory (faster).

DirectQuery → real-time but slower (use only when live data required).

Reduce Data Volume

Remove unused columns & rows at Power Query stage.

Pre-aggregate data at SQL/stored procedure level.

Use Star Schema

Fact tables (sales, transactions).

Dimension tables (customer, product).

Avoid snowflake models → more joins → slower queries.

Optimize DAX

Use variables (VAR) to avoid repeated calculations.

Replace FILTER(ALL(…)) with REMOVEFILTERS().

Turn off Auto Date/Time → saves hidden tables.

Use Aggregations → summary tables for faster queries.

Reduce visuals → each visual fires a query.

Performance Analyzer Tool → check slow visuals/measures.

Source-level optimization → push transformations to SQL views/SPs.

Example: If you had a 10M row sales table, you’d:

Create aggregated daily sales at SQL level.

Import summary, not transactions.

Use star schema with dimensions.

2. DAX measure for rolling 12-month average in Power BI

Concept: Rolling averages smooth trends (aka TTM – Trailing Twelve Months).

Step 1: Rolling 12M Sales

Rolling 12M Sales =
CALCULATE(
[Total Sales],
DATESINPERIOD(
'Date'[Date],
MAX('Date'[Date]),
-12,
MONTH
)
)

MAX(‘Date'[Date]) → last date in context.

DATESINPERIOD → creates a moving 12-month window.

Step 2: Average

Rolling 12M Average =
DIVIDE([Rolling 12M Sales], 12)

Fiscal Year Adjustment:
If fiscal year starts in April, add:

FiscalMonthNumber = MOD(MONTH('Date'[Date]) - 4 + 12, 12) + 1

Use Case: Analyzing yearly sales trend for April–March fiscal year.

3. How do you implement Row-Level Security (RLS) in Power BI?

Definition: Restricts what data users can see.

Steps:

At dataset level (recommended)

Create Roles → Sales[Region] = “East”.

Or dynamic RLS:

Sales[Region] =
LOOKUPVALUE(EmployeeRegion[Region], EmployeeRegion[Email], USERPRINCIPALNAME())

Publish dataset → Assign users to roles in Power BI Service.

At report level

Not directly supported. Workarounds:

Create separate reports with filtered datasets.

Use shared dataset with RLS enforced.

Scenario:

Sales Manager (East) logs in → sees only East region sales.

No need for multiple reports.

4. How do you handle Slowly Changing Dimensions (SCD) in SQL for Power BI?

Definition: Dimension attributes change slowly (e.g., customer address).

Types:

Type 1: Overwrite (no history).

MERGE INTO DimCustomer AS Target
USING StagingCustomer AS Source
ON Target.CustomerID = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.Address = Source.Address;

Type 2: Add new row (history kept).

UPDATE DimCustomer
SET EndDate = GETDATE()
WHERE CustomerID = @CustomerID AND EndDate IS NULL;

INSERT INTO DimCustomer (CustomerID, Name, Address, StartDate, EndDate)
VALUES (@CustomerID, @NewName, @NewAddress, GETDATE(), NULL);

Type 3: Add new columns (limited history).

In Power BI: Join fact table to active record (EndDate IS NULL) or historical snapshot with date filters.

 

5. How to implement Incremental Refresh in Power BI?

Why: Avoid refreshing 10 years daily → refresh only recent data.

Steps:

Create parameters: RangeStart, RangeEnd.

Use them in Power Query:

= Table.SelectRows(Source, each [SaleDate] >= RangeStart and [SaleDate] < RangeEnd)

Define policy in Power BI Service:

Store past 5 years.

Refresh last 1 day/week.

Use Case: Sales dataset from 2010 → refresh only yesterday’s data daily.

 

6. How to create custom visuals in Power BI using R/Python?

Steps:

Enable R/Python scripting.

Insert R/Python visual.

Write code (example in Python):

import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot(x='Category', y='Sales', data=dataset)
plt.show()

 

Considerations:

Limited to 150K rows.

Slower than native visuals.

Non-interactive with slicers.

Security (code execution).

Use Case: Heatmap correlation of sales by region.

 

Q7. What are the components of Power BI?

Answer:
Power BI has five main components which together provide a complete BI solution:

  1. Power Pivot – Fetches, cleans, and loads data into the model.

  2. Power Query – Performs transformations (filter, merge, split, reshape).

  3. Power Q&A – Lets users query data in natural language (English).

  4. Power View – For building interactive reports and visuals.

  5. Power Map – For geographic and 3D spatial data visualization.

Together, these components enable data preparation → modeling → visualization → sharing.


Q8. How is Power BI different from Tableau?

Answer:
Both are BI tools, but differ in pricing, usability, and features:

  • Pricing: Power BI is cheaper (especially for larger teams) compared to Tableau.

  • Ease of Use: Power BI is simpler for Microsoft users, Tableau offers advanced customization.

  • Data Connectivity: Tableau has slightly more robust data prep and connection features.

  • Visualization: Tableau is more flexible; Power BI provides easier-to-use visuals.

  • Integration: Power BI integrates seamlessly with Excel, Office 365, SharePoint, Teams.

If cost and Microsoft integration matter, choose Power BI. If advanced analytics and customization matter, choose Tableau.


Q9. How is Power BI different from other BI tools?

Answer:

  • Power BI allows connecting to multiple data sources without coding.

  • Deep integration with Microsoft tools (Office 365, Bing, Azure).

  • Free desktop version allows:

    • Analysis of datasets up to 1GB.

    • 10,000 rows/hour of streaming data.

  • Provides natural language queries through Power Q&A.

  • Compared to tools like QlikView/Tableau, Power BI is more cost-effective and beginner-friendly.


Q10. What is DAX Function?

Answer:
DAX (Data Analysis Expressions) is a formula language used in Power BI, Power Pivot, and SSAS.

  • It is a functional language – every calculation is wrapped in a function.

  • Supported data types: Numeric, Boolean, Date/Time, String, Decimal.

  • Example:

Total Sales = SUM(Sales[SalesAmount])
  • Use Cases: Create measures (KPIs), calculated columns, aggregations.


Q11. What are the types of filters available in Power BI Reports?

Answer:
Power BI supports three levels of filters:

  1. Visual-level filter – Applies only to one chart/visual.

  2. Page-level filter – Applies to all visuals on a page.

  3. Report-level filter – Applies to the entire report (all pages).

This layered filtering gives flexibility in scope and granularity.


Q12. What is the CALCULATE function in DAX?

Answer:
The CALCULATE() function modifies the filter context of an expression.

Syntax:

CALCULATE(<expression>, <filter1>, <filter2>, ...)
  • Expression: The calculation to evaluate.

  • Filter(s): Boolean or table expressions to apply.

Example:

Sales_India = CALCULATE(SUM(Sales[Amount]), Sales[Country] = "India")

Here, only rows where Country = India are considered.


Q13. What are the different challenges faced by a Power BI developer?

Answer:

  • No strong data cleansing features compared to ETL tools.

  • Struggles with large datasets (performance drops after ~20–30k rows in some cases).

  • Complex DAX can impact performance.

  • Bulky UI for beginners.

  • Governance & security (especially in large enterprises).


Q14. How can you do automation in Power BI?

Answer:
Automation is done via Power Automate (Microsoft Flow).

  • Cloud-based automation tool.

  • Supports 200+ connectors (SQL, SharePoint, Outlook, etc.).

  • Automates:

    • Scheduled refresh.

    • Notifications (e.g., email when sales fall below threshold).

    • Data movement between apps.

Example: Auto-send a sales dashboard PDF to managers every Monday morning.


Q15. Can you explain the bookmark function of Power BI?

Answer:
Bookmarks capture the current state of a report (filters, slicers, visuals).

  • Use Cases:

    • Save different views (e.g., Sales by Region vs Sales by Product).

    • Create presentations by navigating through bookmarks.

    • Custom navigation for end-users.


Q16. Explain the major concept of Power BI.

Answer:
Power BI is a self-service, cloud-based BI tool for data visualization and sharing.

  • Connects to multiple sources.

  • Allows transformation & modeling.

  • Provides interactive dashboards & KPIs.

  • Accessible via desktop, web, and mobile.

  • Strong focus on collaboration & sharing.


Q17. What is Power BI Desktop?

Answer:
Power BI Desktop is a Windows application (free) used for building BI reports.

  • Features: Import, clean, model, visualize data.

  • Can publish reports to Power BI Service.

  • Works offline (unlike Power BI Service which is cloud-based).

  • File format: .pbix


Q18. What is Get Data in Power BI?

Answer:

  • A feature to connect Power BI to various data sources.

  • Found in the Menu bar of Power BI Desktop.

  • Supports: Local files (Excel, CSV), cloud sources (Azure, Salesforce), web sources, databases (SQL, Oracle).

  • Enables both live connection and import mode.


Q19. What are the building blocks of Power BI?

Answer:
Five building blocks:

  1. Visualizations – Graphical representations.

  2. Datasets – Collections of data.

  3. Reports – Multi-page visual collections.

  4. Dashboards – One-page summary with pinned visuals.

  5. Tiles – Single visualization on a dashboard.


Q20. What are visualizations in Power BI?

Answer:

  • Visual representation of data.

  • Power BI offers built-in and custom visuals.

  • Examples: Bar chart, Line chart, Pie, Waterfall, Funnel, Map, Gauge, Ribbon.

  • Custom visuals available via marketplace.


Q21. What are reports in Power BI?

Answer:
Reports = collection of visuals related to a dataset.

  • Can span multiple pages.

  • Help in deep analysis.

  • Can be shared via Power BI Service.

  • Example: A sales report may contain visuals for revenue trend, region-wise sales, and product KPIs.


Q22. What do you understand by dashboards in Power BI?

Answer:
Dashboards are single-page summaries made by pinning visuals from reports.

  • Always one page.

  • Used for quick decision-making.

  • Example: Executive dashboard showing Revenue, Expenses, Profit, and Growth Rate on one screen.


Q23. What are the KPIs in Power BI?

Answer:

  • Key Performance Indicators (KPIs) measure progress against a goal.

  • Always has:

    • Base Value (Measure)

    • Target Value

  • Graphically shows performance (green = good, red = bad).

  • Example: Actual Sales vs Target Sales.

Q24. How many versions of Power BI are available as of now? How are they different?

Answer:
Currently three editions are available:

  1. Power BI Desktop (Free):

    • Authoring & development tool.

    • Connect, clean, visualize, and create reports.

    • File format: .pbix.

  2. Power BI Pro (Paid, ~$9.99/user/month):

    • Enables collaboration, sharing, publishing.

    • Ad-hoc analysis.

  3. Power BI Premium (Enterprise):

    • Advanced analytics, big data, AI support.

    • Dedicated cloud & on-prem capacity.

    • High-performance, scalable solution.


Q25. What are Slicers in Power BI?

Answer:

  • Slicers = visual filters placed on the report canvas.

  • Allow users to interactively filter data.

  • Example: A slicer for “Year” lets users toggle between 2021, 2022, etc.

  • Unlike background filters, slicers are visible & user-controlled.


Q26. Discuss about the Content Packs in Power BI.

Answer:
Content Packs are pre-built collections of:

  • Datasets

  • Reports

  • Dashboards

  • Excel Workbooks

Two types:

  1. Service Provider Content Packs – e.g., Salesforce, Google Analytics.

  2. User-Created Content Packs – Built & shared within organizations.

Purpose: Quick sharing & deployment of BI solutions.


Q27. What are Custom Visuals in Power BI?

Answer:
Custom Visuals are user-developed visuals built using Power BI SDK.

  • Created with JavaScript libraries (D3.js, jQuery, R, Python).

  • Examples: Sankey charts, Gantt charts, Bullet graphs.

  • Available on Microsoft AppSource marketplace.

  • Extend visualization beyond built-in charts.


Q28. What data sources can you connect to Power BI?

Answer:
Power BI supports wide connectivity:

  1. Files: Excel (.xlsx), CSV (.csv), PBIX.

  2. Content Packs: From providers (Salesforce, Google Analytics) or internal users.

  3. Databases: SQL Server, Oracle, MySQL, PostgreSQL, Azure SQL DB.

  4. Connectors: Azure Analysis Services, OData, APIs, web sources.


Q29. How are a Power BI dashboard and a report different from each other?

Answer:

Feature Dashboard Report
Pages Always one page Can have multiple pages
Data Sources Multiple datasets One dataset
Filtering No filtering Filtering & slicing allowed
Alerts Supports alerts No alerts
Dataset Access Cannot view underlying data Can view tables & fields

Q30. What is Power Pivot?

Answer:
Power Pivot is an in-memory component in Power BI.

  • Stores compressed data for faster access.

  • Builds data models & relationships.

  • Creates calculated columns & measures.

  • Supports large datasets efficiently.


Q31. What is Power View?

Answer:

  • A visualization technology for interactive reports.

  • Available in Excel, SQL Server, SharePoint, Power BI.

  • Supports charts, graphs, maps, and slicers.

  • Useful for exploring and storytelling with data.


Q32. Define Power Query.

Answer:
Power Query is an ETL tool for shaping and transforming data.

  • Provides GUI to clean, merge, filter, split, pivot/unpivot.

  • Generates M language code in the background.

  • Allows combining multiple sources.


Q33. What is the major difference between a Filter and a Slicer?

Answer:

  • Filter: Applied in the background, not visible to end-users.

  • Slicer: Visible, interactive filter element placed on the report canvas.

Slicers are more user-friendly & dynamic.


Q34. What is the Power BI Service?

Answer:
Cloud-based platform for:

  • Publishing & sharing reports.

  • Collaboration via Workspaces.

  • Scheduled refresh.

  • Row-Level Security.

  • Access on web & mobile.


Q35. Share some key points about DAX usage in Power BI.

Answer:

  • Used for calculated columns, measures, tables.

  • Similar to Excel formulas but more powerful.

  • Works with row context & filter context.

  • Enables KPIs like ratios, YoY growth, moving averages.


Q36. Which table functions will you use to group data in Power BI?

Answer:

  • SUMMARIZE(): Groups data by one or more columns.

  • SUMMARIZECOLUMNS(): Optimized grouping function in Power BI.


Q37. Explain the concept of Power BI DAX.

Answer:
DAX has three fundamentals:

  1. Syntax – Formula rules (e.g., = SUM(Sales[Amount])).

  2. Context – Row Context (row by row), Filter Context (applied filters).

  3. Functions – Predefined (SUM, AVERAGE, CALCULATE, IF, FILTER).


Q38. Explain some commonly used DAX functions.

Answer:

  • Date/Time: DATE, HOUR.

  • Time Intelligence: DATESYTD, PREVIOUSMONTH.

  • Logical: IF, AND, OR.

  • Math: DIVIDE, ROUND.

  • Statistical: COUNTROWS, AVERAGE.

  • Text: CONCATENATE, LEFT, FORMAT.

  • Table: FILTER, DISTINCT, VALUES.


Q39. What is a Summarize function in DAX?

Answer:
Used to aggregate and group data.

Syntax:

SUMMARIZE(<table>, <grouping_column>, [<name>, <expression>])

Example:

SUMMARIZE(Sales, Sales[Region], "TotalSales", SUM(Sales[Amount]))

Q40. What are DAX Calculation types?

Answer:
Two main types:

  1. Calculated Columns: Row-by-row calculations.

    • Example: Profit = Sales[Revenue] - Sales[Cost].

  2. Measures: Aggregations across entire dataset.

    • Example: Total Sales = SUM(Sales[Revenue]).


Q41. Explain Power BI Query Editor.

Answer:

  • Tool for data transformation before loading.

  • Functions: Remove columns, merge, pivot/unpivot, filter.

  • Does not change source data – only Power BI dataset.


Q42. Are Power View and Power Query the same?

Answer:

  • Power Query: Used for data preparation (ETL).

  • Power View: Used for visualization & reporting.


Q43. What is Parameter in Power BI?

Answer:

  • A dynamic value used in queries/reports.

  • Can be reused across multiple queries.

  • Example: Instead of hardcoding year = 2023, create a Year Parameter.


Q44. How to sort data in Power BI?

Answer:

  • Alphabetical / numeric sort.

  • Sort by Column (e.g., Month sorted by MonthNumber).

  • Custom DAX-based sort.


Q45. How to combine two columns in Power BI?

Answer:
Use DAX Calculated Column:

FullName = Employee[FirstName] & " " & Employee[LastName]

Q46. What is Power BI Data Gateway? How can we use it?

Answer:
Bridge between on-premises data and Power BI Service.

  • Types: Personal & Enterprise gateway.

  • Uses:

    • Secure data transfer.

    • Required for scheduled refresh.

  • Example: Refresh on-prem SQL Server data in cloud dashboard.


Q47. How is the Filter function used in Power BI?

Answer:
The FILTER() DAX function restricts rows of a table.

Syntax:

FILTER(<table>, <boolean_expression>)

Example:

Sales_India = CALCULATE(SUM(Sales[Amount]), FILTER(Sales, Sales[Country] = "India"))

Returns only sales where Country = India.

 

At Learnomate Technologies, we don’t just teach tools—we train you with real-world, hands-on knowledge that sticks. Our Data Analyst training program is designed to help you crack job interviews, build impactful projects, and grow confidently in your data-driven career.

Want to see how we teach? Hop over to our YouTube channel for bite-sized tutorials, student success stories, and data analysis techniques explained in simple English.

Ready to get certified and hired? Check out our Data Analyst course page for full curriculum details, placement assistance, and batch schedules.

Curious about who’s behind the scenes? I’m Ankush Thavali, founder of Learnomate and your guide for all things data, analytics, and visualization. Let’s connect on LinkedIn—I regularly share practical insights, job alerts, and learning tips to keep you ahead of the curve.

And hey, if this article got your curiosity going…

👉 Explore more on our blog, where we simplify complex topics like SQL, Python, Power BI, Excel, and advanced analytics.

Thanks for reading! Now it’s time to turn this knowledge into action. Happy learning, and see you in class or in the next blog!

Happy Analyzing!

ANKUSH