Power BI — SA Migration Guide¶
This page is designed for Solution Architects assessing a customer's Power BI estate and planning a migration or modernization toward Databricks. It is not a developer or admin guide.
Platform Architecture¶
graph TD
subgraph Authoring
PBD[Power BI Desktop\n.pbix file]
PBRS_AUTH[Report Server\nReport Builder]
end
subgraph PowerBI_Service["Power BI Service (Cloud SaaS)"]
GW[On-Premises Data Gateway]
DS[Dataset / Semantic Model\nAnalysis Services engine]
CAP[Premium Capacity\nDedicated compute]
API[REST API / XMLA Endpoint]
WS[Workspace\nApp Workspace]
PORTAL[Web Portal / Power BI App]
end
subgraph PowerBI_RS["Power BI Report Server (On-Prem)"]
RS_ENGINE[Report Processing Engine]
RS_DB[ReportServer DB\nSQL Server]
RS_PORTAL[Web Portal]
RS_SCHED[Scheduler Service]
end
subgraph Consumers
BROWSER[Browser\nInteractive]
EMAIL[Email Subscription]
EMBED[Embedded App\niframe / SDK]
API_CONSUMER[API / Automation]
end
subgraph Sources
SQL[(Relational DB\nSQL Server / Azure SQL)]
AAS[Azure Analysis Services\n/ SSAS Cube]
FILES[Flat Files\nExcel / CSV]
CLOUD[Cloud Sources\nAzure / Snowflake / etc.]
DW[Databricks SQL\nLakehouse]
end
PBD -->|Publish .pbix| WS
PBRS_AUTH -->|Upload .rdl / .pbix| RS_PORTAL
Sources --> GW
GW --> DS
Sources --> DS
DS --> CAP
CAP --> PORTAL
PORTAL --> BROWSER
PORTAL --> EMAIL
PORTAL --> EMBED
API --> API_CONSUMER
WS --> API
RS_ENGINE --> RS_DB
RS_SCHED --> RS_ENGINE
RS_PORTAL --> BROWSER
RS_ENGINE --> EMAIL
Data Flow¶
graph LR
subgraph Sources
SQL[(SQL / DW)]
CUBE[OLAP Cube\n/ SSAS / AAS]
FILE[Files\nExcel / CSV]
CLOUD[Cloud API\n/ SaaS]
end
subgraph Processing
GW[Data Gateway\nOn-Prem Proxy]
DS[Semantic Model\nImport / DirectQuery\n/ Live Connection]
REFRESH[Scheduled Refresh\nor DirectQuery]
end
subgraph Render_Deliver["Render & Deliver"]
SVC[Power BI Service\nReport Engine]
SNAP[Cached Snapshot\nor Live Query]
PORTAL[Web Portal\n/ Mobile App]
EMAIL2[Email\nSubscription]
EMBED2[Embedded\niframe / SDK]
FILE_OUT[Export\nPDF / Excel / CSV]
end
SQL --> GW
CUBE --> GW
FILE --> GW
CLOUD --> GW
GW --> DS
CLOUD -->|Direct connector| DS
DS --> REFRESH
REFRESH --> SVC
SVC --> SNAP
SNAP --> PORTAL
SNAP --> EMAIL2
SNAP --> EMBED2
SNAP --> FILE_OUT
1. Ecosystem Overview¶
Power BI is Microsoft's flagship self-service and enterprise BI platform. It covers the full analytics lifecycle — from desktop data modeling to cloud-hosted interactive dashboards, paginated reports, and embedded analytics. Microsoft positions Power BI as the BI layer of the Microsoft Fabric platform and it competes directly with Tableau, Qlik, and Looker in enterprise deals.
Product Variants and Editions¶
| Variant | Deployment | Key Characteristics |
|---|---|---|
| Power BI Desktop | Local authoring tool (Windows) | Free, used to create .pbix files; not a server product |
| Power BI Service | SaaS (app.powerbi.com) | Cloud-hosted workspace, sharing, collaboration, scheduled refresh |
| Power BI Report Server | On-premises | Self-managed server, ships with SQL Server or Power BI Premium; hosts .pbix and paginated .rdl reports |
| Power BI Embedded | Azure PaaS (A-SKU) | Embed Power BI reports in custom apps; billed per capacity, not per user |
| Power BI Premium | Add-on to Service (P-SKU / EM-SKU) | Dedicated capacity, paginated reports, XMLA read/write, deployment pipelines |
| Microsoft Fabric | Unified SaaS platform | Power BI is the BI and reporting workload inside Fabric; shares the same Semantic Model (Analysis Services) engine |
Why Customers Use It¶
- Self-service analytics — business users build and share dashboards without IT
- Executive and operational dashboards — KPI monitoring connected to SQL Server, Azure SQL, or Azure Synapse
- Paginated reporting — pixel-perfect, print-ready reports replacing SSRS (Power BI Premium or Report Server)
- Embedded analytics — reports embedded in ISV products, customer portals, or internal apps via Power BI Embedded
- Microsoft stack consolidation — natural choice when the customer is already on Azure, Teams, SharePoint, or Dynamics
Key Discovery Questions¶
- How many reports and dashboards exist in the workspace(s)? How many were accessed in the last 90 days?
- Is the customer on Power BI Service (cloud), Power BI Report Server (on-prem), or both?
- Which license tier — Free, Pro, Premium Per User (PPU), Premium capacity (P-SKU), or Fabric?
- Are paginated reports (.rdl) in use? These are a separate migration workstream from .pbix reports.
- How are reports consumed — interactive portal, Teams/SharePoint embed, embedded in a custom app, email subscription, scheduled PDF/Excel export?
- What data sources are connected — SQL Server, Azure SQL, SSAS/AAS cubes, Excel files, cloud SaaS connectors?
- Are datasets (semantic models) shared across multiple reports, or is each report self-contained?
- Is DirectQuery or Live Connection in use? (Significant latency and governance implications)
- Is row-level security enforced? Is it static (fixed roles) or dynamic (username-based)?
- Are Power BI Apps used to package and distribute content to large audiences?
- Are Dataflows (Power Query / ETL inside Power BI) in use as a data prep layer?
- Is Power BI Embedded used — and if so, what application owns the embed tokens?
2. Component Architecture¶
| Component | Role | Migration Equivalent | SA Note |
|---|---|---|---|
| Power BI Desktop | Windows authoring tool; creates .pbix files containing data model, DAX measures, and report layouts | Power BI Desktop (retained) or Databricks SQL dashboard authoring | Not a server component — lives on developer laptops. Source control gap is common. |
| Power BI Service | Cloud SaaS host for workspaces, datasets, reports, dashboards; manages refresh schedules and sharing | Databricks SQL workspace + Unity Catalog + partner BI service | The cloud tenant is the authoritative catalog; REST API is the inventory tool. |
| Power BI Report Server | On-prem server; hosts .pbix and paginated .rdl reports with a web portal and scheduler | Databricks SQL + Power BI Service (cloud migration) or SSRS replacement path | On-prem installs are common in regulated industries; XMLA and REST API both available. |
| Semantic Model (Dataset) | In-memory Analysis Services engine (VertiPaq for Import, DirectQuery for pass-through); stores DAX measures and relationships | Databricks SQL Serverless + Unity Catalog semantic layer; or keep Power BI as BI layer over Databricks | The most complex migration artifact. DAX logic often duplicates transformation logic that belongs in dbt or Databricks SQL. |
| Data Gateway | On-prem agent that proxies queries from Power BI Service to on-premises data sources | Databricks workspace with VPC/VNet peering or Private Link to on-prem sources | Gateway is often a shared, unmonitored service. Single point of failure for all cloud-refreshed on-prem data. |
| Power BI REST API | Programmatic access to workspace metadata, report exports, refresh triggers | Databricks REST API + Unity Catalog APIs | Critical for inventory. Use Admin API (tenant-wide) not workspace API for full estate scan. |
| XMLA Endpoint | Analysis Services protocol endpoint for Premium/PPU; allows third-party tools to read/write the semantic model | Databricks SQL endpoint (JDBC/ODBC) | Enables ALM Toolkit, Tabular Editor, and external BI tools to connect directly to semantic models. |
| Deployment Pipelines | Premium/Fabric feature for Dev → Test → Prod promotion of workspaces | Databricks Repos + CI/CD pipelines | Three-stage pipeline is built-in to Premium. Not available in Pro license. |
| Dataflows (Gen1/Gen2) | Power Query-based ETL inside Power BI Service; produces reusable datasets stored in Azure Data Lake | Databricks ETL pipelines / Delta Live Tables / dbt | Dataflows are a hidden data engineering layer inside BI. Often overlooked in migration scoping. |
3. Artifact Lifecycle¶
| Stage | What Happens | Where (Server/Client) | Artifact Involved | Migration Risk |
|---|---|---|---|---|
| Author | Developer opens Power BI Desktop, imports or connects to data, builds data model, writes DAX, designs report visuals | Client (Windows app) | .pbix file (local) |
No native source control in Desktop; teams use Git manually or via Fabric Git integration. Stale local files are common. |
| Deploy/Publish | Developer clicks Publish in Desktop (or CI/CD pipeline uploads via API); .pbix is uploaded to a Service workspace | Client → Server | .pbix uploaded to Power BI Service REST API |
Workspace may have no deployment pipeline — manual publish to prod is common. No rollback without backup. |
| Compile / Model Load | Service loads the semantic model into VertiPaq (columnar in-memory engine) for Import mode; no compile step for DirectQuery | Server | In-memory VertiPaq columnar store | Import model size limits (1 GB for Pro, configurable for Premium). Large models may fail to load. |
| Refresh (Import mode) | Scheduled or on-demand refresh; gateway fetches data from source, re-loads VertiPaq cache | Server (gateway agent on-prem, processing in cloud) | Refreshed dataset in Service | Refresh failures are silent unless monitored. Large tables with no incremental refresh re-load fully each run. |
| Execute / Query | User opens report → DAX queries sent to semantic model engine → VertiPaq returns aggregates (Import) or passes SQL to source (DirectQuery) | Server | DAX query against in-memory model or source DB | DAX complexity and cardinality drive query time. DirectQuery under heavy load stresses the source system. |
| Render | Report visuals rendered in browser as interactive SVG/HTML; paginated reports rendered server-side as PDF/HTML/Excel | Client (interactive reports), Server (paginated reports) | Rendered report page or exported file | Paginated reports (.rdl) render server-side and are a separate migration workstream. |
| Deliver | User accesses portal, receives email subscription snapshot, opens Teams tab, views embedded iframe, or calls Export API | Server (email/export), Client (portal/embed) | PNG snapshot, PDF, Excel, or live interactive report | Email subscriptions deliver static snapshots — not live data. Embedded apps use embed tokens with expiry. |
SA Tip: The Publish step is where most customers have a governance gap. Without deployment pipelines, developers publish directly to production workspaces. Ask whether the customer has a Dev/Test/Prod separation — if not, the migration is also a governance improvement project.
4. Data Sources and Dataset Model¶
Power BI supports two connection modes that have very different migration implications:
- Import — data is loaded into VertiPaq in-memory cache; fast queries, but refresh latency and size limits apply
- DirectQuery — queries pass through to the source system in real time; no data stored in Power BI; source system bears all query load
- Live Connection — read-only connection to an existing SSAS or AAS semantic model; Power BI is purely a rendering layer
- Composite models — mix of Import and DirectQuery tables in a single model; complex to migrate
Data Source Types¶
| Data Source Type | Frequency in Customer Estates | Migration Path | Risk |
|---|---|---|---|
| SQL Server / Azure SQL (Import) | Very high | Databricks SQL warehouse or Unity Catalog table; repoint connector | Low — standard SQL, easy to repoint |
| SQL Server / Azure SQL (DirectQuery) | High | Databricks SQL Serverless (DirectQuery equivalent via partner BI) | Medium — query patterns must perform on Databricks SQL |
| Azure Synapse Analytics | Medium | Databricks SQL (natural peer); repoint connector | Low |
| SSAS / Analysis Services Live Connection | High | Keep AAS or migrate model to Databricks + partner BI semantic layer | High — entire semantic layer abstraction must be preserved or rebuilt |
| Azure Analysis Services | Medium | Databricks + Power BI Premium composite or migrate DAX to Databricks SQL | High — DAX measures must be rewritten or kept in Power BI |
| Excel / flat files | High | Delta tables in Unity Catalog; autoloader ingestion | Medium — often ad-hoc; ownership unclear |
| SharePoint Lists | Medium | Databricks ingestion via API or ADF | Medium — refresh frequency and schema drift |
| Salesforce / SaaS connectors | Medium | Databricks partner connectors (Fivetran, Airbyte) → Delta tables | Medium — connector semantics differ |
| Power BI Dataflows | Medium | Replace with dbt / Delta Live Tables | Medium — hidden ETL logic must be extracted |
| Databricks (existing) | Low-growing | Direct connector; minimal migration | Low |
Parameters and Filters¶
Power BI supports report-level filters, visual-level filters, page filters, and DAX-based dynamic measures. Dynamic M Query parameters (used to filter data at source query time) are a common pattern for large tables — they pass values from report slicers into the source query. This is a migration risk: it encodes business logic in the M query layer that may not survive a connector swap.
SA Tip: Ask whether the customer uses dynamic M parameters or query folding optimizations. These are often invisible in the UI but break when the data source changes.
5. Rendering and Delivery Model¶
| Delivery Mode | How It Works | Business Use Case | Migration Equivalent | Risk |
|---|---|---|---|---|
| Interactive portal (powerbi.com) | User opens report in browser; live DAX queries against model | Self-service analytics, dashboards | Databricks SQL dashboard or partner BI workspace | Low — standard portal pattern |
| Power BI App | Packaged workspace content distributed to large audiences without workspace access | Governed report distribution to many consumers | Databricks SQL workspace + access control | Medium — App packaging and audience management must be replanned |
| Teams / SharePoint embed | Tab in Teams channel or SharePoint page embedding a report via iframe | Collaboration, intranet dashboards | Teams tab with Databricks SQL or partner BI | Medium — requires re-embed in new surface |
| Email subscription (snapshot) | Scheduled static PNG/PDF snapshot sent to a distribution list | Executive reports, operational alerts | Partner BI native email subscription or Databricks Workflows + export API | Medium — snapshot delivery often has informal recipient lists |
| Data-driven subscriptions (Premium) | Subscription list driven by a dataset column — personalized reports sent to each recipient | Bursting / personalized delivery to many users | Custom Databricks Workflow + Export API + email service | High — encodes business logic; no direct equivalent in most partner BI |
| Power BI Embedded (A-SKU) | Reports embedded in custom web/mobile app using embed tokens; app manages auth | ISV product, customer-facing portal | Power BI Embedded (retained) or Tableau Embedded or custom Databricks SQL embed | High — embed token lifecycle, app code changes, capacity model all change |
| Export API (PDF/Excel/PNG) | REST API call triggers server-side render and returns file | Automated report generation, downstream file consumers | Partner BI export API or Databricks SQL query export | Medium — API contract changes between platforms |
| Paginated report (RDL) delivery | Pixel-perfect report rendered server-side; delivered as PDF, Word, Excel, or via Report Server web portal | Finance, compliance, invoices, regulated outputs | SSRS (retained) or Power BI Premium paginated reports (retained) or migrate .rdl to new platform | High — paginated reports are a separate product with different migration path |
SA Tip: Data-driven subscriptions (Premium feature) are the single highest-risk delivery pattern. They deliver personalized report snapshots to a list of recipients driven by a dataset — essentially a bursting engine. There is no native equivalent in Databricks or most partner BI tools. Flag this early and plan a custom workflow replacement.
6. Project Structure and Version Control¶
Power BI workspaces are the primary organizational unit in the Service. A workspace contains datasets (semantic models), reports, dashboards, dataflows, and paginated reports. Workspaces can be packaged into Apps for read-only distribution to consumers.
Folder / Workspace Structure¶
- Personal workspace — each user has a private workspace; content here is invisible to admins unless audit logs are checked
- Shared workspaces — team workspaces with role-based access (Admin, Member, Contributor, Viewer)
- Deployment pipeline stages — Premium feature maps workspaces to Dev / Test / Prod stages
- Fabric workspace — unified workspace for Fabric items (Lakehouses, Notebooks, semantic models, reports)
Version Control¶
Power BI has historically had no native version control. The .pbix file is binary (ZIP-based) and does not diff cleanly. Options customers use:
| Approach | How It Works | Limitation |
|---|---|---|
| No version control | Most common; developers keep local copies | No audit trail; accidental overwrites are a real risk |
| SharePoint / OneDrive | .pbix files saved to SharePoint for informal versioning | No branching; no meaningful diff |
| Git via Power BI Desktop Projects (.pbip) | Newer format that serializes the model as JSON/TMDL files for Git | Requires opt-in; not widely adopted yet |
| Fabric Git Integration | Fabric workspaces can be connected to Azure DevOps or GitHub repos | Only for Fabric workspaces; new feature, limited adoption |
| ALM Toolkit / Tabular Editor | External tools that extract and diff the semantic model via XMLA endpoint | Premium/PPU required; developer toolchain adoption varies |
SA Tip: Ask directly: "Is the workspace the source of truth, or do developers have source-controlled project files?" In most Power BI estates, the Service workspace is the only copy. This makes pre-migration backup and export critical before any destructive changes.
Environment Promotion¶
Without deployment pipelines, promotion is manual publish. With deployment pipelines (Premium), workspaces are promoted stage-by-stage with optional parameter overrides per stage (e.g., data source pointing to prod vs. dev database).
7. Orchestration and Scheduling¶
Dataset Refresh Scheduling¶
Import-mode datasets must be refreshed on a schedule to keep data current. Refresh is configured per-dataset in the Service:
- Up to 8 refreshes/day (Pro), 48 refreshes/day (Premium)
- Requires a gateway if source is on-premises
- Incremental refresh (Premium/PPU) can partition large tables and refresh only recent partitions
- Refresh failures send email to dataset owner (if configured)
External Trigger via API¶
Dataset refresh can be triggered via the Power BI REST API (POST /datasets/{id}/refreshes). This is commonly used to chain Power BI refresh after an ETL pipeline completes. The XMLA endpoint also supports TMSL refresh commands from external tools.
Subscriptions and Scheduled Delivery¶
| Delivery Type | Trigger | Recipients | Parameters |
|---|---|---|---|
| Report subscription | Scheduled (cron-like) | Fixed list of Power BI Pro/PPU users | Optional: report page filter state at time of snapshot |
| Dashboard subscription | Scheduled | Fixed list | None (dashboard tile snapshot) |
| Paginated report subscription | Scheduled | Fixed list or SharePoint folder | Can pass report parameters per schedule |
| Data-driven subscription (Premium) | Scheduled | Dynamic list from dataset column | Parameter values from dataset per recipient |
Migration Target¶
| Current Pattern | Migration Target |
|---|---|
| Dataset refresh on schedule | Databricks Workflows job → refresh partner BI dataset via API |
| Report subscription (email) | Partner BI native email subscription |
| Data-driven subscription | Databricks Workflow + Export API + SendGrid/Outlook |
| API-triggered refresh | Databricks Workflows webhook trigger → partner BI API |
8. Metadata, Lineage, and Impact Analysis¶
Where the Catalog Lives¶
The authoritative catalog is the Power BI Service REST API (tenant-scoped Admin API). For on-prem Power BI Report Server, the catalog is the ReportServer SQL Server database.
Key API Endpoints (Power BI Service)¶
# List all workspaces (Admin)
GET https://api.powerbi.com/v1.0/myorg/admin/groups?$top=5000
# List all reports in a workspace
GET https://api.powerbi.com/v1.0/myorg/admin/groups/{groupId}/reports
# List all datasets (semantic models) in tenant
GET https://api.powerbi.com/v1.0/myorg/admin/datasets?$top=5000
# Get dataset datasources (what data sources does this model connect to?)
GET https://api.powerbi.com/v1.0/myorg/admin/datasets/{datasetId}/datasources
# Get activity log (who accessed what, when)
GET https://api.powerbi.com/v1.0/myorg/admin/activityevents?startDateTime=...&endDateTime=...
# Get refresh history for a dataset
GET https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/refreshes
Key Report Server DB Queries (On-Prem)¶
-- Count all reports by type
SELECT Type, COUNT(*) AS report_count
FROM ReportServer.dbo.Catalog
WHERE Type IN (2, 5) -- 2 = RDL report, 5 = .pbix report
GROUP BY Type;
-- Reports not accessed in 90 days
SELECT Name, Path, CreationDate, ModifiedDate, ExecutionTime
FROM ReportServer.dbo.Catalog c
LEFT JOIN ReportServer.dbo.ExecutionLogStorage e
ON c.ItemID = e.ReportID
AND e.TimeStart > DATEADD(day, -90, GETDATE())
WHERE c.Type = 2
AND e.ReportID IS NULL;
-- List all subscriptions with schedule
SELECT c.Name, c.Path, s.Description, sc.NextRunTime, s.LastStatus
FROM ReportServer.dbo.Subscriptions s
JOIN ReportServer.dbo.Catalog c ON s.Report_OID = c.ItemID
JOIN ReportServer.dbo.ReportSchedule rs ON s.SubscriptionID = rs.SubscriptionID
JOIN ReportServer.dbo.Schedule sc ON rs.ScheduleID = sc.ScheduleID;
-- Data source connections per report
SELECT c.Name, c.Path, ds.Name AS datasource_name, ds.Extension, ds.ConnectionString
FROM ReportServer.dbo.Catalog c
JOIN ReportServer.dbo.DataSource ds ON c.ItemID = ds.ItemID;
Lineage¶
Power BI Service has built-in data lineage view in Premium/PPU workspaces — it shows the chain from data source → dataflow → dataset → report → dashboard. For Pro license, lineage must be inferred by joining dataset-to-report relationships from the Admin API.
SA Tip: The most valuable single artifact for scoping is the Activity Log (Power BI Service) or the ExecutionLogStorage table (Report Server). Pull 90 days of activity. You will typically find that 60–70% of reports have had zero views — these are safe candidates for decommission rather than migration.
9. Data Quality and Governance¶
Row-Level Security (RLS)¶
Power BI RLS is defined in the semantic model (dataset) as DAX filter expressions applied to table rows based on the authenticated user's role membership.
| RLS Pattern | How It Works | Migration Complexity |
|---|---|---|
| Static RLS | Fixed roles with hardcoded DAX filters (e.g., [Region] = "West") |
Low — translate to Unity Catalog row filters |
| Dynamic RLS | DAX filter uses USERPRINCIPALNAME() to look up current user in a security table |
Medium — Unity Catalog row filters with current_user() function |
| Object-level security (OLS) | Hides entire tables or columns from specific roles in the semantic model | Medium — Unity Catalog column masks + workspace permissions |
| Report-level RLS via Power BI Embedded | Embed token encodes effective username; RLS evaluated in the model | High — embed token logic must be replicated in new embed architecture |
Column-Level Security¶
Power BI handles column-level security via Object Level Security (OLS) in the semantic model — roles can be denied access to specific columns or entire tables. This is Premium/PPU only. Migration target is Unity Catalog column masks.
Data Freshness and Caching¶
- Import mode — data freshness is determined by refresh schedule; stale data risk between refreshes
- DirectQuery — always live; no caching at model level (tile-level dashboard cache is separate)
- Query caching (Premium) — frequently used query results cached for dashboard tiles
Custom Code and Scripting¶
| Custom Code Type | Where It Lives | Migration Risk |
|---|---|---|
| DAX measures | Inside the semantic model | Medium — must be rewritten or kept in Power BI over Databricks |
| M (Power Query) transformations | Inside dataset or Dataflow | Medium — replace with dbt/Databricks ETL |
| R and Python visuals | Inside .pbix report | High — requires R/Python runtime on Report Server; not supported in all embed scenarios |
| Custom visuals (.pbiviz) | Third-party or custom-built visual components | High — licensing, security review, and availability in new platform |
| Paginated report expressions (VB.NET) | Inside .rdl files | High — identical to SSRS migration; expression language is VB.NET |
SA Tip: Ask specifically about R/Python visuals and custom visuals from AppSource. Both require additional runtime dependencies and security approvals. Custom visuals in particular have ISV licensing attached and may not transfer to a new platform.
10. File Formats and Artifact Reference¶
.pbix — Power BI Desktop File¶
What it is: The primary authoring artifact. A ZIP-compressed binary file containing the semantic model, report layout, data model metadata, and (for Import mode) a compressed snapshot of the imported data.
| Property | Value |
|---|---|
| Created by | Power BI Desktop |
| Stored in | Developer local disk; published to Power BI Service or Report Server |
| Contains | Semantic model (VertiPaq/TMDL), DAX measures, relationships, report pages, visual configs, M queries, embedded data snapshot (Import mode) |
| Human-readable | No (binary ZIP; internal files partially readable) |
| Migration target | Keep Power BI over Databricks SQL; or extract semantic model and rebuild in Databricks |
SA Tip: A .pbix published to a Premium workspace can be downloaded back from the Service via the portal or API — but only if the dataset was originally uploaded (not created in-service). Always verify download availability before migration; some premium datasets cannot be exported.
.pbip — Power BI Desktop Project (newer format)¶
What it is: A folder-based project format that serializes the semantic model as TMDL (Tabular Model Definition Language) JSON files and the report layout as JSON — enabling Git-based version control.
| Property | Value |
|---|---|
| Created by | Power BI Desktop (opt-in preview / GA in 2024) |
| Stored in | Git repository or local file system |
| Contains | /Model/ TMDL files, /Report/ JSON layout files, pbip manifest |
| Human-readable | Yes (JSON/TMDL) |
| Migration target | TMDL model can be imported via Tabular Editor or ALM Toolkit |
SA Tip: Most customers are still on .pbix. .pbip adoption is growing but limited to teams that have explicitly opted in. Don't assume the customer has .pbip files without asking.
.rdl — Report Definition Language (Paginated Reports)¶
What it is: XML-based paginated report definition, identical to SSRS .rdl format. Used in Power BI Report Server and Power BI Premium paginated reports.
| Property | Value |
|---|---|
| Created by | Power BI Report Builder, Visual Studio (SSDT), SSRS Report Designer |
| Stored in | Report Server catalog (ReportServer DB) or Service (paginated reports workspace) |
| Contains | Data source refs, datasets (SQL/MDX queries), parameters, layout definitions, expressions (VB.NET) |
| Human-readable | Yes (XML) |
| Migration target | Power BI Premium paginated reports (Service), SSRS, or rebuild as Databricks SQL + partner BI |
SA Tip: .rdl in Power BI Report Server is identical to SSRS .rdl. If the customer has both Power BI .pbix and .rdl reports on Report Server, treat the .rdl workstream as an SSRS migration.
.rds — Shared Data Source¶
What it is: Reusable XML data source definition stored on the Report Server — used by .rdl paginated reports.
| Property | Value |
|---|---|
| Created by | Report Builder, SSDT, Report Manager |
| Stored in | ReportServer DB catalog |
| Contains | Connection string, authentication method, credential storage |
| Human-readable | Yes (XML) |
| Migration target | Databricks SQL warehouse connection string in Unity Catalog or partner BI data source |
.rsd — Shared Dataset¶
What it is: A reusable query definition (SQL or MDX) stored on the Report Server, shared across multiple .rdl reports.
| Property | Value |
|---|---|
| Created by | Report Builder, SSDT |
| Stored in | ReportServer DB catalog |
| Contains | Data source reference, query, parameters, field list |
| Human-readable | Yes (XML) |
| Migration target | Databricks SQL query / dbt model |
Power BI Dataflow (.json manifest + Parquet in ADLS)¶
What it is: Power Query-based ETL pipeline defined inside Power BI Service; output stored as Parquet files in Azure Data Lake Storage Gen2.
| Property | Value |
|---|---|
| Created by | Power BI Service Dataflows editor |
| Stored in | Azure Data Lake Storage Gen2 (customer's storage account) + Service metadata |
| Contains | M query transformations, entity definitions, incremental refresh config, CDM schema |
| Human-readable | JSON manifest readable; M queries are text |
| Migration target | Delta Live Tables, dbt, or Databricks ETL notebook |
SA Tip: Dataflows are the most overlooked artifact in Power BI migrations. Customers often don't think of them as "data engineering" — but they are. Pull the list of dataflows from the Admin API and treat each one as a separate ETL migration item.
Power BI Dataset / Semantic Model (Service artifact)¶
What it is: The in-service representation of a published .pbix semantic model, hosted as an Analysis Services instance.
| Property | Value |
|---|---|
| Created by | Published from Desktop (.pbix) or created via XMLA push |
| Stored in | Power BI Service capacity (in-memory VertiPaq) |
| Contains | Tabular model, DAX measures, relationships, row-level security roles, refresh schedule |
| Human-readable | Accessible via XMLA endpoint (Tabular Editor, SSMS) if Premium/PPU |
| Migration target | Keep Power BI semantic layer over Databricks SQL; or rebuild metrics in Unity Catalog |
Artifact Quick Reference¶
| Artifact | Extension / Location | Human-Readable | Where Stored | Migration Target | Risk Level |
|---|---|---|---|---|---|
| Power BI Desktop file | .pbix |
No (binary ZIP) | Local disk / Service workspace | Keep Power BI over Databricks or rebuild | Medium |
| Power BI Desktop Project | .pbip (folder) |
Yes (JSON/TMDL) | Git / local disk | TMDL import via ALM Toolkit | Low |
| Paginated report | .rdl |
Yes (XML) | Report Server DB / Service | Power BI Premium paginated or SSRS path | High |
| Shared data source | .rds |
Yes (XML) | Report Server DB | Databricks SQL warehouse connection | Low |
| Shared dataset | .rsd |
Yes (XML) | Report Server DB | Databricks SQL query / dbt model | Medium |
| Dataflow | JSON manifest + Parquet | Partial | ADLS Gen2 + Service metadata | Delta Live Tables / dbt | Medium–High |
| Semantic model (in-service) | Service API artifact | Partial (XMLA) | Service capacity | Keep Power BI or rebuild DAX in SQL | High |
| Custom visual | .pbiviz |
No | .pbix file | Evaluate per ISV; no general path | High |
11. Migration Assessment and Artifact Inventory¶
How to Inventory the Estate¶
| Method | When to Use | Coverage |
|---|---|---|
| Power BI Admin REST API | Primary method for Service (cloud) | Full tenant; includes personal workspaces with Admin scope |
| Activity Log API | Usage data (90-day rolling window) | All content access events; required for identifying unused reports |
| Report Server DB queries | On-prem Report Server | Full catalog, subscriptions, data sources, execution log |
| Scanner API (Fabric) | Premium/Fabric tenants; richer metadata | Dataset tables, measures, data source details, sensitivity labels |
| Power BI REST API (workspace-scoped) | Non-admin access | Only content the caller has access to; misses personal workspaces |
SA Tip: Do not rely on the Power BI portal UI for counting. Personal workspaces are excluded from workspace views, and system-created datasets (auto-generated for Excel uploads, streaming datasets) inflate counts. Use the Admin API.
Complexity Scoring¶
| Dimension | Low | Medium | High | Critical |
|---|---|---|---|---|
| Data source type | SQL Server / Azure SQL (Import) | DirectQuery relational; Dataflows | SSAS / AAS Live Connection; composite models | Custom connector; on-prem cube + DirectQuery composite |
| Query / dataset complexity | Simple SELECT queries | Stored procedures; multi-table joins with M transforms | Complex M (Power Query) with dynamic parameters | OLAP/MDX queries; dynamic query folding |
| Parameter complexity | No parameters or simple text/date | Static dropdown parameters | Cascading parameters; dynamic M Query parameters | Data-driven parameters with external lookup tables |
| Custom code | No custom code | Basic DAX measures | Complex DAX time-intelligence; R/Python visuals | Custom .pbiviz visuals with ISV licensing |
| Report nesting / drill-through | Flat single-page report | Drill-through between reports; tooltips | Cross-report drill-through; subreports in RDL | Deep nested subreports + cross-workspace drill-through |
| Delivery complexity | Portal only | Email subscription (fixed list) | Data-driven subscription; Power BI App | Embedded in custom app with embed token logic |
| Security complexity | No RLS | Static RLS roles | Dynamic RLS with user lookup table | OLS + dynamic RLS + embed token effective identity |
Common Migration Blockers (Power BI–Specific)¶
- DAX semantic layer not portable — The semantic model's DAX measures and relationships are Analysis Services–native. There is no automated conversion to SQL or dbt. If the customer uses Power BI as both the ETL and semantic layer, the migration scope doubles.
- SSAS / AAS Live Connection — Reports using Live Connection have no embedded model; they rely entirely on the SSAS/AAS instance. Migrating these reports requires migrating the entire Analysis Services model first.
- Data-driven subscriptions — Premium-only bursting feature with no equivalent in Databricks or standard partner BI. Each use case requires a custom workflow replacement.
- Power BI Embedded embed tokens — Custom applications managing embed tokens must be re-architected. Embed token generation, token expiry handling, and effective identity mapping are application-layer concerns, not just BI-layer concerns.
- Personal workspace content — Reports and datasets in personal workspaces are invisible without Admin API access. Customers are often surprised by how much content lives in personal workspaces and is effectively unmanaged.
- Dataflows as hidden ETL — Customers frequently don't disclose dataflows during initial scoping because they don't think of them as "data pipelines." Always explicitly ask and scan for dataflows.
Sample Inventory Query (Power BI Service — PowerShell / REST)¶
# Requires Power BI Management module and Admin rights
# Install-Module -Name MicrosoftPowerBIMgmt
Connect-PowerBIServiceAccount
# Get all workspaces
$workspaces = Get-PowerBIWorkspace -Scope Organization -All
# Get all reports across all workspaces
$allReports = @()
foreach ($ws in $workspaces) {
$reports = Get-PowerBIReport -WorkspaceId $ws.Id -Scope Organization
foreach ($r in $reports) {
$allReports += [PSCustomObject]@{
WorkspaceName = $ws.Name
WorkspaceId = $ws.Id
ReportName = $r.Name
ReportId = $r.Id
WebUrl = $r.WebUrl
DatasetId = $r.DatasetId
}
}
}
$allReports | Export-Csv -Path "pbi_report_inventory.csv" -NoTypeInformation
-- Report Server: full inventory with usage, subscriptions, data source type
SELECT
c.Name AS report_name,
c.Path AS report_path,
c.Type AS report_type, -- 2=RDL, 5=pbix
c.CreationDate,
c.ModifiedDate,
MAX(e.TimeStart) AS last_accessed,
COUNT(e.InstanceName) AS access_count_90d,
MIN(ds.Extension) AS datasource_type,
CASE WHEN s.SubscriptionID IS NOT NULL THEN 'Y' ELSE 'N' END AS has_subscription
FROM ReportServer.dbo.Catalog c
LEFT JOIN ReportServer.dbo.ExecutionLogStorage e
ON c.ItemID = e.ReportID
AND e.TimeStart > DATEADD(day, -90, GETDATE())
LEFT JOIN ReportServer.dbo.DataSource ds
ON c.ItemID = ds.ItemID
LEFT JOIN ReportServer.dbo.Subscriptions s
ON c.ItemID = s.Report_OID
WHERE c.Type IN (2, 5)
GROUP BY c.Name, c.Path, c.Type, c.CreationDate, c.ModifiedDate, s.SubscriptionID
ORDER BY last_accessed DESC;
12. Migration Mapping to Databricks¶
Report Types¶
| Power BI Report Type | Databricks / Target |
|---|---|
| Interactive .pbix report (Import / DirectQuery) | Keep Power BI as BI layer over Databricks SQL; or rebuild key reports as Databricks SQL dashboards / Lakeview |
| Paginated .rdl report (pixel-perfect) | Power BI Premium paginated reports (retained) or migrate to SSRS equivalent |
| Dashboard (tiles) | Databricks SQL dashboard or Lakeview |
| Power BI App (packaged workspace) | Databricks SQL workspace + governed sharing |
| Streaming dataset report | Databricks Structured Streaming + partner BI live connection |
Data Sources¶
| Power BI Data Source | Databricks Target |
|---|---|
| SQL Server / Azure SQL (Import) | Databricks SQL warehouse; Delta table in Unity Catalog |
| SQL Server / Azure SQL (DirectQuery) | Databricks SQL Serverless (partner BI DirectQuery connector) |
| Azure Synapse Analytics | Databricks SQL (peer platform; migrate workloads) |
| SSAS / AAS Live Connection | Databricks SQL + Power BI semantic model over Databricks (composite or Direct Lake) |
| Flat files / Excel | Delta table via Autoloader; Unity Catalog external table |
| Power BI Dataflow | Delta Live Tables / dbt model |
| Salesforce / SaaS | Fivetran / Airbyte → Delta table in Unity Catalog |
Dataset and Semantic Layer¶
| Power BI Concept | Databricks Target |
|---|---|
| Dataset / semantic model (DAX) | Keep Power BI semantic layer over Databricks SQL Direct Lake mode |
| DAX measures (simple aggregations) | Databricks SQL queries or dbt metrics |
| DAX time-intelligence measures | dbt date spine + Databricks SQL window functions |
| Power Query (M) transformations | dbt models / Delta Live Tables |
| Dataflows (reusable ETL) | Delta Live Tables or dbt pipelines |
Security¶
| Power BI Security | Databricks Target |
|---|---|
| Static RLS roles | Unity Catalog row filters (static predicates) |
Dynamic RLS (USERPRINCIPALNAME()) |
Unity Catalog row filters with current_user() |
| Object Level Security (OLS) — columns | Unity Catalog column masks |
| Object Level Security (OLS) — tables | Unity Catalog table-level permissions |
| Workspace roles (Admin/Member/Contributor/Viewer) | Databricks workspace groups + Unity Catalog privileges |
| Power BI Embedded effective identity | Databricks SQL service principal + row filter per identity |
Scheduling and Delivery¶
| Power BI Pattern | Databricks Target |
|---|---|
| Dataset scheduled refresh | Databricks Workflows job → partner BI REST API refresh |
| Report email subscription (fixed list) | Partner BI native email subscription |
| Data-driven subscription (bursting) | Databricks Workflow + Export API + email service (custom build) |
| API-triggered refresh | Databricks Workflows webhook → partner BI REST API |
| Incremental refresh | Delta table Z-order + Databricks DLT incremental pipelines |
Portal and Governance¶
| Power BI Concept | Databricks Target |
|---|---|
| Workspace portal | Databricks SQL workspace |
| Power BI App distribution | Databricks workspace groups + governed catalog |
| Deployment pipelines (Dev/Test/Prod) | Databricks Repos + CI/CD (GitHub Actions / Azure DevOps) |
| Sensitivity labels (Microsoft Purview) | Unity Catalog tags + Databricks data classification |
| Audit logs (Activity Log API) | Databricks audit logs + Unity Catalog system tables |
Embedded App Integration¶
| Power BI Embed Pattern | Databricks Target |
|---|---|
| Power BI Embedded (A-SKU, embed token) | Retain Power BI Embedded over Databricks Direct Lake; or migrate to Tableau Embedded |
| iFrame embed in Teams / SharePoint | Teams tab with Databricks SQL or partner BI |
| Embed in custom web app (JavaScript SDK) | Power BI JavaScript SDK (retained) or partner BI embed SDK |
What Doesn't Map Cleanly¶
| Power BI Capability | Why It Doesn't Map | SA Recommendation |
|---|---|---|
| DAX semantic model (complex measures) | DAX is an Analysis Services–native calculation language. Databricks SQL uses ANSI SQL + window functions. Complex time-intelligence DAX (SAMEPERIODLASTYEAR, TOTALYTD, etc.) has no direct SQL equivalent without significant rewrite. | Keep Power BI semantic layer over Databricks SQL via Direct Lake mode. Migrate DAX only if customer wants to consolidate to a single SQL layer. |
| Direct Lake mode (Fabric) | Direct Lake is a Fabric-specific connection mode where Power BI reads directly from OneLake Delta tables without import or DirectQuery. It only exists in Microsoft Fabric; there is no equivalent in standalone Databricks. | Position Direct Lake as the target architecture for Microsoft-stack customers migrating to Databricks + Fabric co-existence, not as a general migration path. |
| Data-driven subscriptions (bursting) | Personalized report delivery to a dynamic recipient list driven by a dataset column. No native equivalent in Databricks, Tableau, or Looker. | Design a custom Databricks Workflow that exports parameterized reports via the partner BI Export API and sends via SendGrid or Microsoft Graph email API. Scope as custom development. |
| Power BI Embedded embed token lifecycle | Embed tokens are Azure AD–based short-lived tokens generated by a backend service principal. The token embeds effective identity for RLS. No other BI platform has an equivalent token model. | Retain Power BI Embedded as the embed layer with Databricks as the data backend (Direct Lake or DirectQuery). Full re-platform requires rewriting the application's auth and embed layer. |
| Custom visuals (.pbiviz, AppSource) | Third-party or ISV-built visual components with their own licensing, security review requirements, and update cadence. Visuals are not portable to other BI platforms. | Inventory all custom visuals and identify ISV-sourced ones. For each, evaluate whether a native visual in the target platform meets the requirement, or whether the ISV has a version for the target tool. |
| Power BI Goals / Scorecards | OKR-style scorecards natively embedded in the Power BI Service. No equivalent in Databricks or standard BI tools. | Replicate as a standard dashboard with status indicators, or use a dedicated OKR tool (e.g., Viva Goals). Flag as out of scope for BI migration. |
13. Quick-Reference Cheat Sheet¶
| Topic | Key Fact | SA Question to Ask |
|---|---|---|
| Report count heuristic | Expect 60–70% of reports to be unused (zero access in 90 days). Personal workspaces often hold 20–30% of total content. | "Can we pull the Activity Log or ExecutionLog for the last 90 days to identify active vs. unused content?" |
| Where usage data lives | Power BI Service: Activity Log API (90-day rolling, Admin scope required). Report Server: ReportServer.dbo.ExecutionLogStorage table. |
"Do you have an Admin account we can use to pull the tenant-wide Activity Log, or is access restricted?" |
| Top migration risk #1 | DAX semantic models are not portable to SQL. Complex DAX = keep Power BI as the semantic layer over Databricks. | "Are the DAX measures in your datasets simple aggregations, or do they use complex time-intelligence or many-to-many relationships?" |
| Top migration risk #2 | Data-driven subscriptions (bursting) have no native equivalent. Each use case is a custom build. | "Do you use data-driven subscriptions in Power BI Premium to send personalized reports to different recipients from a dataset?" |
| Top migration risk #3 | Power BI Embedded applications require re-architecting the embed token logic — not just the report content. | "Which applications embed Power BI reports, and does your dev team own the embed token generation code?" |
| Source control gap | Most customers have no version control for .pbix files. The Service workspace is the only copy. | "If we needed to roll back a report to last week's version, could you do that today?" |
| License tier matters | Paginated reports, data-driven subscriptions, XMLA endpoint, deployment pipelines — all require Premium or PPU. Features available depend on SKU. | "What Power BI license tier are you on — Pro, Premium Per User, or Premium capacity?" |
| Dataflows are hidden ETL | Dataflows are often not disclosed in initial scoping. Each dataflow is a separate ETL migration item. | "Do you use Power BI Dataflows as a data preparation layer feeding your datasets?" |
| Personal workspaces | Content in personal workspaces is invisible without Admin API access. Often 20–30% of total estate. | "Has anyone inventoried personal workspace content, or should we include that in scope?" |
| On-prem gateway dependency | A single gateway failure blocks all cloud-refreshed on-prem datasets. Often unmonitored and unmaintained. | "How many gateways do you have, and when was the last time the gateway software was updated?" |