#PySpark
Tem várias 'regras' sobre o loop for e depende do ambiente tbm. Com a vivência vc vai pegando. No meu outro emprego eu usava for no python para bases de dados enormes e era suave, AGR no meu serviço atual se eu usar um loop for em pyspark pode saber que vai quebrar a sessão kkkkk
October 30, 2025 at 4:25 PM
Flatten nested JSON and XML dynamically in Spark using a recursive PySpark function for analytics-ready data without hardcoding. #dataengineering
How to Flatten Nested JSON and XML in Apache Spark
hackernoon.com
October 28, 2025 at 4:00 AM
Oi amores. Bom dia. Abriu uma vaga para cientista de dados senior na area de recomendações de onde eu trabalho. Precisa de conhecimento em ferramentas de manipulação de dados (sql/pyspark), LLMs (RAG, LLMOps, etc), modelos de linguagem (BERT, hugging face, sentence transformers)
October 27, 2025 at 12:49 PM
How enormous is enormous?

I personally wouldn't use R for this. I would use Python to filter down to what you need and then import into R. For truly enormous CSVs you can use pyspark, or you can read them in line-by-line and only output rows to a new CSV if they pass the filters.
October 15, 2025 at 8:26 PM
Sounds large enough to want to use some sort of lazy loading. Both of the options above are lazy.

I would personally go with reading the file in line-by-line because pyspark can be finicky. It's probably slower but you only need to do it once, right?
October 15, 2025 at 8:42 PM
I'm super appreciative of @databard.bsky.social explaining PySpark in a way that doesn't assume any experience. So many things I wish I knew 4 months ago.
www.youtube.com/watch?v=2p2S...
ETL Approaches When Using Fabric PySpark - Jared Kuehn
YouTube video by Level Up Your Data
www.youtube.com
September 16, 2025 at 8:23 PM
📢 Henderson Scott is #hiring a Confluent Engineer - Apache Kafka, PySpark, Python!

💰 £500.00 - £550.00
GB
⏰ CONTRACTOR
⏰ Contractor

🔗 http://jbs.ink/nMc2lVyj4p6L

#jobalert #jobsearch #python #spark #devops #sql #kafka #design
February 13, 2025 at 1:12 PM
pyspark...
April 9, 2024 at 8:44 AM
Ok, just looked at the benchmark overview and I’m a little disappointed. Comparing performance on a dataset of ~30 GBs tells me very little given it could all fit into RAM on commodity hardware. Like, the difference is up to chunking.
January 2, 2025 at 3:29 AM
A DuckLake reader for PySpark in ~30 lines of code. Despite looking simple, still partitions work across Spark nodes. Zero external dependencies, just the DuckDB JDBC driver. Try that with your favorite lakehouse technology :-)
July 2, 2025 at 7:20 PM
Клиент 360: Обнаружение мошенничества в финтехе с помощью PySpark и машинного обучения

"Каждый банк использует Customer 360 для хранения записей о клиентах в едином виде, и он также может быть использован для обнаружения мошенничества.

Что такое Customer 360?

Customer 360 - это со…

#ai #ml #news
Customer 360: Fraud Detection in Fintech With PySpark and ML
dzone.com
May 17, 2025 at 4:06 PM
AWS Clean RoomsがPySparkのエラーメッセージ設定に対応。参加者が承認すれば、詳細なエラーメッセージを表示でき、分析の高速化とトラブルシューティングの効率化を実現。データ共有せずに協調分析が可能になる。

aws.amazon.com/about-aws/wh...
August 20, 2025 at 11:19 PM
Banamex - Desarrollador de Modelos Analíticos para Machine Learning (Pyspark) – C11 - Ciudad De Mexico Distrito Federal Mexico Job educativ.net/jobs/job/46173...
October 2, 2025 at 5:52 AM
Sei fazer um pipeline de ETL em arquitetura medalhão no PySpark, pra depois criar uma ABT e rodar um modelinho de XGBoost, filho da PULTA
refletindo
November 7, 2025 at 9:28 AM
Fiz todos os testes passarem, tudo bonito, tudo lindo.

Agora porra da JVM rodando por trás do pyspark começa a encrencar comigo e eu não faço a mais puta ideia de como configurar essa bosta.
January 31, 2025 at 12:56 AM
- ¿Qué es un sistema distribuido?
- ¿Has usado Spark? ¿Qué es un RDD? ¿Cuándo usarias PySpark y cuándo Pandas?
November 28, 2024 at 11:07 AM
Hoje eu tava aqui pensando sobre como eu aprendi rápido pyspark. Era algo que eu flertava há anos, mas sempre achava q n era pra mim. E como também não usava no dia a dia, nunca sabia se daria conta. Muita loucura
September 18, 2024 at 3:49 PM
This quick test in #MicrosoftFabric showed me a lot of interesting behaviors:
- There are rendering differences between Python and PySpark notebooks.
- You can use "temp" storage allocated to the notebook for many crazy ideas, even for rendering videos 😂 (if you then store it into Lakehouse)
January 25, 2025 at 11:00 AM
How to Fix Data Skew in Apache Spark with the Salting Technique

Learn how to fix data skew in Apache Spark using the salting technique for improved performance and balanced partitions in Scala and PySpark.

#hackernews #news
How to Fix Data Skew in Apache Spark with the Salting Technique
Learn how to fix data skew in Apache Spark using the salting technique for improved performance and balanced partitions in Scala and PySpark.
hackernoon.com
June 28, 2025 at 10:39 PM
Databricks Architect – L1

Job title: Databricks Architect - L1 Company: Wipro Job description: information, visit us at www.wipro.com. Databricks Architect · Should have minimum of 10+ years of experience... · Must have skills - DataBricks, Delta Lake, pyspark or scala spark, Unity Catalog · Good…
Databricks Architect – L1
Job title: Databricks Architect - L1 Company: Wipro Job description: information, visit us at www.wipro.com. Databricks Architect · Should have minimum of 10+ years of experience... · Must have skills - DataBricks, Delta Lake, pyspark or scala spark, Unity Catalog · Good to have skills - Azure and/or AWS Cloud... Expected salary: Location: Bangalore, Karnataka Job date: Sun, 01 Jun 2025 06:05:40 GMT Apply for the job now!
findsuperdeals.shop
June 3, 2025 at 1:21 PM
Azure Databricks / Python / Pyspark Senior Data Engineer @ Exusia Description Sr Data Engineers & Tech Leads – Python/Pyspark/Databricks Department: Sales and Delivery Team - EmpowerIndustry:...

Result Details
Origin
aijobs.net
May 4, 2025 at 6:34 AM
May 20, 2025 at 7:24 AM
Python Data Analytics Engineer (Pyspark | Kafka | Data Pipelines) : Irvine, CA Job Title: Python Data Analytics Engineer (Pyspark | Kafka | Data Pipelines) Location: Irvine, CA (Hybrid/Onsite as required) Exp: 9 + Years minimum   Role …

Interest | Match | Feed
Origin
corptocorp.org
September 11, 2025 at 4:49 PM
Y isn't #Rust replacing #scala and #pyspark as the main functional language in #spark? Is there an alternative to #spark that is built on #rust?
October 5, 2025 at 4:25 PM