Claim 35 Post Templates from the 7 best LinkedIn Influencers

Get Free Post Templates
Zach Wilson

Zach Wilson

These are the best posts from Zach Wilson.

22 viral posts with 39,389 likes, 1,978 comments, and 598 shares.
4 image posts, 0 carousel posts, 1 video posts, 17 text posts.

👉 Go deeper on Zach Wilson's LinkedIn with the ContentIn Chrome extension 👈

Best Posts by Zach Wilson on LinkedIn

A mistake many data engineers make is thinking real-time = streaming pipelines!

Whenever a stakeholder says they want data in real time, you shouldn't default to dreams of Flink, Kafka, and watermarks.

You should clarify with precision what an acceptable amount of latency is for this use case.

Many times when a stakeholder asks for real-time data, it can be solved with an hourly batch pipeline. The incremental benefit to jump from hourly batch to streaming isn't worth it because it impacts the homogeneity of your suite of pipelines and makes the overhead maintenance much higher!

Sometimes stakeholders say real-time and what they mean is “predictable refresh rates.“ This is a sign you need to do better as a data engineer at setting SLAs for your pipelines about when they'll refresh.

#dataengineering
Today was the second day I’ve ever been to Airbnb’s office and it was to return my work laptop. I worked there over 500 working days and went into the office once.

Thank you Airbnb for being remote-first and a genuinely 21st century technology company!
You raised the bar for me on what to expect from companies in terms of life style freedom.

Return to Office now equals “I’m not applying to that company!”

#airbnb
#remoteworking
#dataengineering
Post image by Zach Wilson
Cmon Netflix! Is the range really $150k to $900k? I thought y’all added levels!

Imagine being on a team of engineers. Y’all are all senior engineers, one of y’all is making $150k and the other is making $900k doing the same job.

That’s what this range implies right, Netflix? That people in this role on the same team will be making this range?

Why would you put the overall market range when people care about the range for this specific job?

California transparency laws aren’t having quite the effect we would hope yet!

#compensation
Post image by Zach Wilson
I won’t be at the Databricks AI summit this year even though it’s across the street from my apartment.

Here’s why:

Last year, a Databricks cofounder asked me if he could use my content in his keynote at the summit.

I said yes. He used two different YouTube videos.

After the summit, this cofounder urged me to teach Databricks.

I decided to do so in January. And have taught over 1000 students Databricks since then.

Databricks also became the single biggest line item cost to my business.

Databricks has a startups program where they give people $50,000 in credits.

They rejected me from this program because, “they do not see value in working with me”

So, I won’t be attending this year because they aren’t very kind to bootstrapped startups.

They say I should pass that cloud cost onto the student.

Databricks stands alone here in burning students looking for affordable education.

The following companies do give me credits to allow for more affordable cloud education:

- Amazon Web Services (AWS)
- Astronomer
- Starburst
- Snowflake
- Confluent

Please support these companies because they care about students.

If you were looking to see me this year, sorry I will not be there.

Thanks for understanding! Hopefully I will see you guys next year!

Please repost to increase awareness!
I know data engineers who know just Python and SQL who make $500k at Netflix.
You don’t need to know the high performance languages to make a killing as a data engineer!

#dataengineering
You shouldn't try to learn all of data engineering at once! You'll get overwhelmed and feel like you aren't making any progress!

A piecemeal approach to eating the data engineering elephant is better.

Start with:
- SQL
Get good with SELECT, GROUP BY, WHERE, HAVING, JOIN, etc. DataLemur 🐒 (Ace the SQL & Data Interview) is a great resource to get into this.

Then branch into:
- Python
Get good with loops, variables, classes, dictionaries, tuples, and arrays. LeetCode still seems like the best place to practice this.

Then branch into:
- job orchestration
Airflow is the most popular option here but the startup costs are kind of high to get going. A new option is Mage that has a very easy startup and can orchestrate things as well as Airflow.

Then branch into:
- distributed compute (Snowflake, Spark, BigQuery, etc)
Almost all of these platforms have free trials. Some key things here to learn about are partitioning, memory, broadcast joins, and caching.

Then branch into:
- data modeling
Learn about fact tables, dimension tables, slowly changing dimensions, cumulative table design, and change data capture. Reading one of Bill Inmon's books about this will get you ahead here!


#dataengineering
Hey everybody! I’m excited to announce I’m going to be opening my calendar up to anybody for a 30 minute #dataengineering career guidance Google Meet call from 8 to 9 AM Pacific starting tomorrow for the next two weeks.

The cost of the call is $50 and all proceeds over the next two weeks will be donated to Code to Inspire, a charity that teaches women and girls in Afghanistan how to code!
Code to Inspire’s website: https://lnkd.in/etg4dmBT

EDIT: We’re fully booked now! 22 sessions and raising $1100 for women and girls in Afghanistan.

Let’s help raise $1000 for those in need by getting the career help you need!
Data engineering isn’t just Spark
Full stack engineering isn’t just NextJS
Data analytics isn’t just Tableau
Analytics engineering isn’t just dbt
Data science isn’t just XGBoost
Machine learning isn’t just fine tuning models
Prompt optimization isn’t just AdalFlow

Stop boiling entire fields down to one technology and recognize every field here has a ton of breadth!
I was on a first date with a woman today and one of you randomly came up to me and were liked “your videos changed my life, dude, thank you”

And the woman was like, “does that happen often?”

And I was like, “only every day of my life”

That moment makes the last 5 years of shouting endlessly into the void about data pipelines worth it!

Content creation is crazy!
If you’re less than 30 years old and scared you don’t have things “figured out”

I blew up my life at 29.

I stepped off the “career ladder”

I no longer idolized being a principal engineer!

I reinvented myself at 29 and my life got much better!

I have a theory that I’ll be reinventing myself every 7 years.

So even if you have it “figured out” now. You won’t in the future! And that should take some of the pressure off!

You don’t need to solve all of the worlds problems today.

Just make your bed, write some code and be grateful that you’re healthy!

#dataengineering
YouTube has paid me $411 so far for releasing five data engineering boot camp videos.

The equivalent of <25% of one students tuition.

This is why most educators don’t publish on YouTube. Educational content doesn’t garner enough eyeballs to be worth it.

Do I regret doing this? Not even a little bit.

This is why:

- the free content has supercharged my January boot camp sales. Instead of averaging 1-2 sales $
($2000-5000) per day, I’m averaging 4-5 sales ($8000-12000) per day.

Releasing the free boot camp has generated $60k in paid boot camp sales in the same week.

So maybe more educators need to quit gate keeping their knowledge and realize releasing quality content on YouTube is NOT ZERO SUM. You make the pie bigger and you help people out who cannot afford it!

#dataengineering
Post image by Zach Wilson
Every SQL concept you should know to ace #dataengineering interviews:

- Basics
SELECT, FROM, WHERE, GROUP BY, ORDER BY and HAVING

- Window functions
Know the difference between RANK vs DENSE_RANK vs ROW_NUMBER

Know how PARTITION BY and ORDER BY work in the OVER clause

Know about the QUALIFY clause if you really want to wow the interviewer

- JOINs
Know when self-joins work well
Know LEFT vs FULL OUTER vs INNER joins
Know how to handle skewed joins and the tradeoffs of the various approaches

- Advanced analytic functions
Know how to leverage GROUPING SETS, ROLLUP and CUBE

Know how to create your own UDFs to enhance your SQL

- Arrays
Know about CROSS JOIN UNNEST / LATERAL VIEW EXPLODE
Know how to TRANSFORM and REDUCE array values

What did I miss?
Being good at #dataengineering is WAY more than being a Spark or SQL wizard.

You need to be good at questioning things and pushing back on low-value requests. Communicating with your downstream users to find out their pains and forging a data model that they understand and is scalable.

You need to be able to work well as a team. Data scientists and software engineers are often in your value chain. Learning how to excite and motivate the people in your value chain will truly launch your career into the stratosphere.

You need to be good at stress management. Data engineering usually involves juggling many requests in parallel. Being able to breathe and focus on the highest value requests first is critical for long-term success as a data engineer.
Jan 6th, 2016 is a day that I will always remember. It was simultaneously the scariest and best day of my life. It marks 7 years of sobriety for me.

I was so sick and tired of being stuck in a rut with drugs. I would get sober for a month or two and then fall right back into it. It was a dance I did from 2009 to 2015. I was done! I wanted something better! I was done throwing away my potential!

I decided to move away. Far away. From Salt Lake City to Alexandria, Virginia.

No “friends” pulling me down. Just sitting with my thoughts so I could focus on my success. Changing my environment changed my chances of success.

7 months after getting sober was when I landed my big break doing #dataengineering for Facebook.

After a year of being sober, so many aspects of my #mentalhealth improved.

I used to have panic attacks almost every day. I now rarely have them.

I used to not sleep much at all. Now I sleep soundly.

I used to not be very confident that I was smart and could do great things. Now I’m fearless with my capabilities and I know I’m going to do great things.

Getting sober was the number one best decision I had to make to build my success!
Adhoc SQL queries and SQL queries running in production will generally look pretty different. Copy and pasting the data scientist’s query into Airflow isn’t quite enough to be considered “putting into production”

Some high-level things to look for in ad-hoc queries that should be changed before moving to production.

1. GROUP BY 1,2,3
 / ORDER BY 1,2,3
This is used to speed up writing ad-hoc queries. Please spell them out in production.

2. SELECT *
This grabs all the columns quickly for ad-hoc queries. Please spell out * in production.

3. LIMIT 1000
Limits should generally be removed when moving to production since you want the entire data set.

4. Sub queries
Sub queries should almost always be abstracted as CTEs when running in production.

5. WHERE date >= startDate
This should be switched to WHERE date BETWEEN startDate AND endDate. Otherwise your pipeline won’t be idempotent and will produce different results depending on when it’s ran.

#dataengineering
You can’t just hire a data scientist or a data engineer and expect to unlock all the value out of your data.

You need to treat your data as an investment. You invest in infrastructure and it pays you back over time.

This infrastructure has many pieces:

- logging
- pipelines
- analytics
- models
- experimentation
- decision making processes

You need to hire people to handle different pieces of this stack.

If the data isn’t actually incorporated into decision making processes, then you’re still missing a huge part of the infrastructure even if you have a rockstar data team.

#dataengineering
#datascience
Many data engineers get filtered during the data modeling round of the #dataengineering interview.

Some key things to know:

- when to use normalized vs denormalized data
- diagramming skills to sketch out the one->many, many->many relationships
- be able to talk soundly about dimension, fact, and aggregate tables.
- be able to talk about efficient table designs like cumulative, slowly-change dimension, and delta tables

Some key things to do in the interview:

- ask about schema
- clarify your relationship assumptions
- ask about business use case and query patterns

Candidates that do these tend to get hired
Databricks featured my YouTube video from 2021 in a keynote today!

I didn’t expect my pandemic hair to be broadcast to so many people live at the same time 😆!


#dataengineering
Post image by Zach Wilson
Jan 6th, 2016 - the day I decided to get sober.

I packed up my bags and drove from Salt Lake City to Washington DC.

I was scared. I didn’t know what this new life was going to have in store for me.

I was just so sick and tired of being sick and tired!

So I left Utah despite all my friends wanting me to stay

That day will always be etched into my soul because I overcame the inertia of my home town and spread my wings and flew.

7 months after getting sober, I landed a job at Facebook and I knew my life was set. I could finally give up that survival mindset and start focusing on thriving!

Please remember if you’re suffering, you’re one good decision away from a brand new life!
Data engineers come in a few levels:

- level 1
Knows Python and SQL. Can move data from point A to point B so long as it’s not too big

- level 2
Knows distributed compute basics like BigQuery and Spark. Can move data around on the order of single terabytes

- level 3
Masters distributed compute and can build pipelines of arbitrary size

- level 4
Actually talks with stakeholders before building pipelines
The best part of being a solopreneur is never having to use JIRA ever again!

#softwareengineering
#mentalhealth
When I worked at Airbnb, something data engineering hiring managers said was “We don't have much success hiring Meta data engineers“

The reason for this is Meta's definition of the data engineer role and Airbnb's definition were quite different!

Airbnb wanted to hire a different data engineering archetype.

Airbnb looked for strong data structures and algorithm skills, strong SQL, software engineering fundamentals, and Scala/Java programming experience.

Meta looked for strong analytical skills, strong SQL, visualization skills, some data structures and algorithms, and decent Python skills. This skill set aligned more closely with the analytics engineer role at Airbnb!

This can make hiring and interviewing for data engineering roles very frustrating!

Before you do a full onsite loop with a big tech company, make sure you know what type of data engineering role you're interviewing for!

#dataengineering

Related Influencers