An interesting article was posted to reddit with the title “Will Rust Take over Data Engineering? ” As I read the article I had a few reactions to its contents, which I outline below.
“was Python made for Data Engineering in the first place?”
No, but neither was Rust. Who cares what it was made for, what matters is how well it addresses issues in the domain.
“Rust may not replace Python outright, but it has consumed more and more of JavaScript tooling.”
So what? What does that have to do with data engineering (the subject of the article).
The goal of Rust is to be a good programming language for creating highly concurrent, safe, and performant systems
That’s nice, but I see concurrency as distributed between machines by something like Spark. I dont want to complicate my thinking within a process with concurrency.
“It can be frustrating to fight every single mistake before being able to test or run a quick script”
it can also be frustrating to have to annotate your program with explicit types when you just want to sketch out and develop a solution.
I like Python’s gradual typing. But strong typing (which Rust has and Java does not) is a definite virtue.
“this article is not meant to be a deep dive into Rust, but rather map it to the field of data engineering.”
No it isn’t. It’s an article comparing Python and rust for data engineering.
“The go-to language for data engineers is Python, quite a bad language for making it not break in production, as many engineers working with data will agree.”
I’m a data engineering working in Python and I do not agree. Copious logging has helped me figure out why things go wrong. And type errors are not the main issue I’ve encountered in the 3 years I’ve been on my current data engineering project.
Come to think of it, why do I not have a list of every production failure at hand?
Doing my best to recall all the breaks we’ve had in production of our batch-style, spark-driven, autosys-commanded data engineering application, I can recall:
- PikePDF changed the exception hierarchy for the C++ library it wrapped and my specific error trapping did not catch the new exception.
- The devops team claimed to install the new version of our software but the log revealed the old version was still running.
- The small sample space used by the QA team did not cover all the various PDF files we were dealing with. When we reached production and were doing 1 million documents per month, spurious data in the PDF caused our PDF pipeline to choke.
Having the luxury of a trained QA team trying to break my application was more useful than a compiler detecting type errors.
TYPE ERRORS OCCUR WHEN YOU PASS THINGS AROUND: SIMULA-STYLE OO INVOLVES METHODS CALLING METHODS SO TYPE ERRORS ARE LESS OF AN ISSUE
I’m taking a strong stance here. But I do not pass much data around. I consider myself an object-relational data engineer. Rust encourages more of a functional style. So type-checking makes sense of Rust… although Common Lisp has done quite well without it.
“Defining expectations with data types and having vigorous checks at coding and compile time will prevent many errors.”
No it doesn’t. Most of the errors/bugs in my code had to do with strategic failure:
- what if you dont clear the data-directory before the new run starts?
- What if you dont export the environmental variable in the correct part of the autosys job?
- What if one process completes before the other starts?
- what if the disk is full
Most of my problems were SITUATIONAL – I cant think of one time a type issue made it into production.
“Less relevant for data engineers, but super helpful: speed“
hmm, my programs run fast-enough and when we need speed we farm out more servers.
But for a single-processor situation, speed could be important. It just hasnt happened for me.
“Find more on Rust Once, Run Everywhere.”
The linked to article does a pathetic job of showing that Rust can run in many places – it doesnt list the 4-5 frameworks that compile to Web Assembly from rust.
Personally I would reach for HaXe over Rust any day for write-once, run many places because it is a more approachable language.
“Rust is a more complex language to learn, but it was the most loved technology for seven years (2022, 2021, 2020, 2019, 2018, 2017, 2016) in a row on the Stack Overflow survey”
OK, but how many people love Rust? The userbase of Python is 10-100 times that of Rust.
try getting a high-level executive to love rust or a business team member. they would take one look at all those ampersands and brackets and say:
Are you trying to return us to PERL?
non-technical person confronted with learning/using Rust
“Why Rust is Popular?“
Who said it is popular? all you did was provide a single data point that it was loved:
- How many chemists and biologists are going to drop scipy for rust?
- how many introductory computer science classes are going to drop python for rust?
- how many people who see all the functionality on PyPI are going to go: “you know what, I think I will rewrite all of this in Rust”
- How many non-technical people are going to prefer Rust?
“Rust vs. Python (SQL)“
Why is “SQL” in the title of this section? The section says nothing about SQL and Python and SQL are not synonomous.
“It’s also a shift from an interpreted language such as Python to a more Functional Language (FP) style, which Rust certainly supports.”
I think you mean it’s a shift from OBJECT-ORIENTED to functional. That is the more meaningful difference.
“The downsides of Python: Mobile development”
Kivy. Nothing more need be said. And again, this article was about Python and Rust for data engineering.
“There is also lots of adoption in Python with … more Python and Functional Programming style.”
I’m not aware of this… stream-based programming using Streaming data frameworks are certainly widely used at corporate level but I dont think Coconut and other single-cpu functional programming is widely used at corporate (or academic) level.
I think you are making this statement without a shred of evidence to support it.
“The Rust projects we have seen above are excellent and will continue to grow for vital and core components, but for them to be helpful for the average data engineer. “
This is a nonsense, incompletely formed sentence. You are making no sense whatsoever here.