WorryFree Computers   »   [go: up one dir, main page]

Skip to content

jaumpedro214/posts

Repository files navigation

Posts

A list of (some of) my posts and personal projects.

The objective of this repository is to put together in a single page my main posts and projects. I prioritize posts written in English (and that I'm proud of 😁).

I mainly write about Machine Learning and Data Science on Medium. You can visit my Medium profile to view all my posts.

The list

Title Link Tags
Code
Creating a Text Preprocessing Microservice with FastAPI πŸ”— πŸ”—
Brazilian Laws analysis with TF-IDF and K-Means πŸ”— πŸ”—
Understanding Topic Coherence Measures πŸ”— -
How to ensemble Clustering Algorithms πŸ”— πŸ”—
Improve Your Data Preprocessing with ColumnTransformer and Pipelines πŸ”— -
Creating a Simple ETL Pipeline With Apache Spark πŸ”— πŸ”—
Machine Learning Streaming with Kafka, Debezium, and BentoML. πŸ”— πŸ”—
Stream Processing and Data Analysis withΒ ksqlDB πŸ”— πŸ”—
A Fast Look at Spark Structured Streaming + Kafka πŸ”— πŸ”—
First Steps in Machine Learning with Apache Spark πŸ”— πŸ”—
Temporal and Geo-referenced Traffic Management with Python+Streamlit πŸ”— πŸ”—
Hands-On Introduction to Delta Lake with (py)Spark πŸ”— πŸ”—
Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query πŸ”— πŸ”—
Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue) πŸ”— πŸ”—
Automatically Managing Data Pipeline Infrastructures With Terraform πŸ”— πŸ”—
Automatically Detecting Label Errors in Datasets with CleanLab πŸ”— πŸ”—
My First Billion (of Rows) in DuckDB πŸ”— πŸ”—

* Is used in almost every project