João Pedro – Medium

João Pedro

Pinned

João Pedro
in
Towards Data Science

My First Billion (of Rows) in DuckDB

First Impressions of DuckDB handling 450Gb in a real project

12 min readMay 1, 2024

--

7

My First Billion (of Rows) in DuckDB

--

7

João Pedro
in
Towards Data Science

Anatomy of Windows Functions

Theory and practice of an underappreciated SQL operation

12 min read5 days ago

--

1

Anatomy of Windows Functions

--

1

João Pedro
in
Towards Data Science

Automatically Detecting Label Errors in Datasets with CleanLab

A Tale of AI and wrongly-classified Brazilian Federal Laws

10 min readJul 22, 2023

--

Automatically Detecting Label Errors in Datasets with CleanLab

--

João Pedro
in
Towards Data Science

Automatically Managing Data Pipeline Infrastructures With Terraform

I know the manual work you did last summer

15 min readMay 2, 2023

--

Automatically Managing Data Pipeline Infrastructures With Terraform

--

João Pedro
in
Towards Data Science

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Learning a little about these tools and how to integrate them

17 min readApr 6, 2023

--

2

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

--

2

João Pedro
in
Towards Data Science

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

On-premise and cloud working together to deliver a data product

10 min readMar 6, 2023

--

2

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

--

2

João Pedro
in
Towards Data Science

Hands-On Introduction to Delta Lake with (py)Spark

Concepts, theory, and functionalities of this modern data storage framework

10 min readFeb 16, 2023

--

3

Hands-On Introduction to Delta Lake with (py)Spark

--

3

João Pedro

Temporal and Geo-referenced Traffic Management with Python+Streamlit

Applying modern tools to visualize time and spatial data in a dashboard

10 min readJan 29, 2023

--

1

Temporal and Geo-referenced Traffic Management with Python+Streamlit

--

1

João Pedro
in
Towards Data Science

First Steps in Machine Learning with Apache Spark

Basic concepts and topics of Spark MLlib package

11 min readJan 4, 2023

--

First Steps in Machine Learning with Apache Spark

--

João Pedro
in
Towards Data Science

A Fast Look at Spark Structured Streaming + Kafka

Learning the basics of how to use this powerful duo for stream-processing tasks

11 min readNov 5, 2022

--

4

A Fast Look at Spark Structured Streaming + Kafka

--

4

João Pedro

João Pedro

Bachelor of IT at UFRN. Graduate of BI at UFRN — IMD. Strongly interested in Machine Learning, Data Science and Data Engineering.

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams