Import, DirectQuery, or Direct Lake? How to Choose the Right Storage Mode for Massive Data in Power BI and Microsoft Fabric

Import, DirectQuery, and Direct Lake each break down at different points in Power BI and Fabric. Learn where each storage mode wins, fails, and why.

MP

A model running smoothly in Import mode on 20 million rows can grind to a halt when scaled to 2 billion — and the wrong storage mode choice will silently sabotage your performance, DAX patterns, and refresh strategy. By the end of this article, you’ll know where each storage mode breaks down for large datasets, how Direct Lake actually behaves in Fabric, and which critical modeling and DAX patterns fall apart depending on your choice.

Import: Still the Default — Until You Hit the Memory Wall

Import mode’s strength is VertiPaq compression, local cache, and blazing-fast visuals — until you cross the boundary where memory or refresh time collapses. The common misconception is that Import is always superior unless you need “live” data, but there are two hard ceilings:

  • Capacity memory limits: Even with aggressive columnar compression, massive tables (say, Fact_Sales with 500M+ rows and 100+ columns) will eventually outstrip what your Premium/Fabric SKU allocates.
  • Refresh duration: The model must reload all data each refresh. Over slow or unreliable sources, refreshes can fail or block for hours.

What many overlook is that VertiPaq performance is not just a function of row count, but also cardinality and column count. For example, a Fact_Sales table with 10 columns of low cardinality can compress to a fraction of RAM compared to 3 columns with high cardinality (e.g., unique GUIDs or high-entropy strings).

The other hidden Import trap is time intelligence: if your Date table isn’t gap-free and marked, DAX time functions may produce silent errors. This is a modeling problem, not a storage mode one, but large models often inherit imperfect data — and Import won’t warn you.

Naive DAX in Import: Storage Looks Free, Until It Isn’t

Consider a calculated column like this:


-- Calculated column in Fact_Sales
Is_Large_Sale = IF([Amount] > 10000, "Yes", "No")

This seems harmless. But with 500M rows, that column is now physically materialized in memory, and every added column multiplies your VertiPaq footprint. The “just add a column” habit scales poorly — especially when columns are low-compressibility or string-heavy. In Import, every modeling choice is a storage cost, and you pay at model load and refresh time.

DirectQuery: The Devil in the Formula Engine Details

DirectQuery’s pitch is “unlimited data size” — query the source directly, no data imported. The catch is that all DAX logic not directly translatable to SQL must be executed by the formula engine row by row, with every visual triggering live source queries. This is where scale breaks most developer expectations:

  • Visual interactivity is throttled by source latency and concurrency. The formula engine in Power BI is single-threaded per query. On a report page with several visuals, each can trigger multiple round-trips — and even simple slicer changes can mean seconds of waiting.
  • Some DAX “just works” in Import but fails in DirectQuery due to unsupported functions or query folding breaks. For instance, custom scalar functions, some time intelligence, and complex calculated tables can’t be pushed to SQL and either error out or kill performance.
  • Security and RLS: Row-level security in DirectQuery pushes predicates to the source, which can expose performance potholes if the underlying tables aren’t indexed for those filters. This is often invisible in small-scale dev setups but explodes in production.

DirectQuery is not just “Import with live data.” It’s a different engine. The moment your DAX measure cannot be folded, the formula engine loops over result sets in memory — and on large models, this means waiting, not working.

Worked Example: The DAX That Folds in Import, But Fails in DirectQuery

Take a Fact_Sales table in DirectQuery, with a measure calculating running total:


-- Naive running total
Running Total Amount = 
CALCULATE(
    SUM(Fact_Sales[Amount]),
    FILTER(
        ALLSELECTED(Dim_Date),
        Dim_Date[Date] <= MAX(Dim_Date[Date])
    )
)

This works (and is fast) in Import mode because VertiPaq can scan and aggregate in memory. In DirectQuery, FILTER and ALLSELECTED force the engine to materialize potentially huge intermediate results, and only a subset of DAX translates to a single SQL query. The user sees a spinning wheel — or worse, a timeout.

Even more subtle: the measure above may work for small filters, but as the date selection widens, the number of SQL subqueries and the rows scanned balloon. DirectQuery punishes naïve DAX that would be trivial in Import.

Direct Lake: What Actually Works as of Fabric’s Current Implementation

Direct Lake is Fabric’s play for “lakehouse-fast,” combining the live-data promise of DirectQuery with Import-like speed — but only under specific conditions that are often misunderstood:

  • Direct Lake reads Parquet/Delta tables directly from OneLake and loads them into VertiPaq segments at query time. This means it can deliver sub-second visuals on millions of rows if the data is shaped for columnar read and not filtered at high cardinality.
  • It falls back to DirectQuery behaviors under certain conditions: If you use unsupported DAX, features not yet available (as of current releases), or query formats that can’t be satisfied from the local segments, Power BI quietly drops to DirectQuery. This fallback is silent — there’s no UI warning — and performance drops to DirectQuery levels.
  • Refresh is not refresh as you know it: There is no scheduled refresh in the same sense as Import; the data is always current as per the underlying Delta/Parquet lakehouse. But schema drift, security, and file-level changes can introduce unexpected errors or stale reads if not managed carefully.

The counterintuitive result: Direct Lake is not a free lunch. It is not “DirectQuery but fast” or “Import without refresh” — it is its own mode, with its own breakpoints and gotchas. For massive datasets, it only wins if the data is modeled to maximize segment reuse and avoid fallback triggers.

Worked Example: Direct Lake Fallback Without Warning

Suppose you have a ~2B row Fact_Sales Delta table in OneLake and build a Power BI model in Direct Lake mode. You create a DAX measure:


-- Uses SELECTEDVALUE (supported)
Current Day Sales = 
CALCULATE(
    SUM(Fact_Sales[Amount]),
    Dim_Date[Date] = SELECTEDVALUE(Dim_Date[Date])
)

This runs at Import-like speeds — until you add a measure with a DAX function not supported by Direct Lake (say, PATH or custom table expressions). That single measure can force the entire visual’s query to drop to DirectQuery behavior, without any warning. The report slows down, and the cause isn’t visible in the UI.

The catch: as of current Fabric releases, the list of DAX features that trigger fallback is not fixed and should be checked against Microsoft’s up-to-date documentation. Relying on “just works” leads to hidden regressions after each platform update.

The Storage Mode Decision Matrix: What Actually Drives the Choice for Massive Data

Most published guidance still frames the storage mode choice as “Import unless you need real-time; DirectQuery for live data; Direct Lake for Fabric.” That’s shallow. The real drivers at scale are:

  • Refresh SLA and source volatility: If data latency of up to several hours is acceptable and your refresh window is reliable, Import remains the king for query speed. But the moment your refresh regularly overruns, or you’re burned by schema drift, Direct Lake becomes attractive.
  • Modeling complexity and DAX patterns: If your model is measure-heavy, with complex custom DAX or calculated columns/tables, only Import is truly safe. DirectQuery and Direct Lake both restrict what DAX can be pushed to the engine, and unsupported patterns break or degrade silently.
  • Data size vs. cardinality: Raw row count is less critical than the distinct count in key columns. High-cardinality columns kill compression in Import, and kill segment pushdown in Direct Lake. For DirectQuery, they turn every slicer into a source query bottleneck.
  • Security and governance: RLS in Direct Lake and DirectQuery pushes predicates to the source; in Import, it’s evaluated in memory. Large enterprise models with sensitive data and complex RLS logic can see very different performance and leakage risks depending on the mode.
  • Fabric integration: If you’re all-in on Fabric, Direct Lake removes the “data movement” layer. But if your source is SQL Server, Dataverse, or Oracle, Direct Lake is not in play — you’re choosing between Import and DirectQuery only.

It’s not a one-size-fits-all. The right choice is dictated by the hardest constraint in your scenario, not by theory or textbook definitions.

Checklist: What I Validate Before Choosing a Storage Mode for Large Models

  • Estimate model size in Import, factoring in column cardinality and compression. Use the Desktop “Model size” view and DAX Studio VertiPaq Analyzer for a realistic footprint — not just row count.
  • Prototype essential DAX measures in both Import and DirectQuery/Direct Lake, and check query plans. If a measure doesn’t fold, expect pain at scale.
  • Test RLS logic with realistic user predicates in DirectQuery/Direct Lake — and verify the queries generated against the backend source. Surprises almost always surface here.
  • Review the current (not year-old) documentation for Direct Lake’s supported DAX features and fallback triggers. This changes frequently; never assume parity with Import.
  • Understand your refresh SLAs and what happens when the source schema changes mid-stream. Import is brittle here; Direct Lake/Query is more live, but at the cost of stability and potential silent failure.

Actionable Takeaway

Don’t commit to a storage mode for large Power BI or Fabric models based on “best practice” charts or default recommendations. Build a scale prototype, stress test your actual DAX and RLS logic, and know the real behaviors — especially where Direct Lake’s fallback or DirectQuery’s formula engine bottlenecks will hurt you. The right mode is the one that breaks last for your specific combination of data shape, refresh SLA, and DAX patterns — and that answer will change as Fabric evolves.

MP
Max Power
Published May 31, 2026  ·  Updated May 31, 2026