The art of simplicity

Posts

Showing posts with the label Python

Python vs PySpark notebooks in MS Fabric

Being new to Microsoft Fabric I noticed that you have multiple options when writing notebooks using Python: run your code with PySpark (backed by a Spark cluster) or with Python (running natively on the notebook's compute). Both options look almost identical on the surface — you're still writing Python syntax either way — but under the hood they behave very differently, and picking the wrong one can cost you time, money, and unnecessary complexity. In this post I try to identify the key differences and give you some heuristics for deciding which engine to reach for. Python vs PySpark: what's actually different? When you select PySpark in a Fabric notebook, your code runs on a distributed Apache Spark cluster. Fabric spins up a cluster, distributes your data across multiple worker nodes, and executes transformations in parallel. The core abstraction is the DataFrame (or RDD), and operations are lazy — nothing actually runs until you trigger an action like .show() ...

How to work with OneLake files locally using Python

Last week I shared how you could use the OneLake File Explorer to sync your Lakehouse tables to your local machine. It's a convenient way to get your Parquet and Delta Lake files off the cloud and onto disk — but what do you actually do with them once they're there? In this post, I’ll walk you through how to interact with your locally synced OneLake files using Python. We'll cover four practical approaches, with real code you can drop straight into a notebook. Where are your files? When OneLake File Explorer syncs your files, they land in a path that looks something like this: C:\Users\<you>\OneLake - <workspace name>\<lakehouse name>.Lakehouse\Tables\<table name> Keep that path in mind— you'll be passing it into every example below. Delta Lake tables are stored as folders containing multiple Parquet files plus a _delta_log/ directory, so make sure you're pointing at the table's root folder, not an individual file. Readin...

VSCode - Change Python version

After installing the latest Python version on my local machine, I noticed that VSCode was still referring to an old(er) version. In this post I'll show how to fix this. Let's dive in! I installed a new Python version using the official installer: Download Python | Python.org . However when I tried to run a Python program in VSCode, I noticed that an older version was still used when I looked at the output in the terminal: & C:/Users/bawu/AppData/Local/Microsoft/WindowsApps/python3.9.exe d:/Projects/Test/MarkItDownImages/example.py Traceback (most recent call last): File "d:\Projects\Test\MarkItDownImages\example.py", line 1, in <module> from markitdown import MarkItDown ImportError: cannot import name 'MarkItDown' from 'markitdown' (C:\Users\bawu\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\markitdown\__init__.py) To fix it, open...