SlothDB is a from-scratch C++20 embedded SQL database in active development. Same model as DuckDB and SQLite: query Parquet, CSV, JSON, Arrow, Avro, SQLite, and Excel files directly with SQL, ...
Most conversations around AI focus on models, prompts, embeddings, and orchestration. In production, that is rarely the hardest part. The hard part is usually upstream: ingesting ugly, oversized, ...
DuckDB is an embedded database, similar to SQLite, but designed for OLAP-style analytics. It is crazy fast and allows you to read and write data stored in CSV, JSON, and Parquet files directly, ...
Already using NumPy, Pandas, and Scikit-learn? Here are seven more powerful data wrangling tools that deserve a place in your toolkit. Python’s rich ecosystem of data science tools is a big draw for ...
In this post we’ll explore the Delta Lake Spark connector’s Z-Order command through both visualization and implementation. We’ll attempt to develop a visual intuition for what Z-Order does to the ...
Snowpark for Python gives data scientists a nice way to do DataFrame-style programming against the Snowflake data warehouse, including the ability to set up full-blown machine learning pipelines to ...
The table below shows my favorite go-to R packages for data import, wrangling, visualization and analysis — plus a few miscellaneous tasks tossed in. The package names in the table are clickable if ...