Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts
Found the internet!
Data engineering
Posts
Communities

Posts about Data engineering

Subreddit Icon
r/dataengineering
78.8k members
News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.
Visit
r/engineering
419k members
r/engineering is a forum for engineering professionals to share information, knowledge, experience related to the principles & practices of the numerous engineering disciplines. r/engineering is **NOT** for students to ask for guidance on selecting their major, or for homework / project help. Read the sidebar BEFORE posting.
Visit
r/EngineeringStudents
681k members
This is a place for engineering students of any discipline to discuss study methods, get homework help, get job search advice, and find a compassionate ear when you get a 40% on your midterm after studying all night.
Visit
r/civilengineering
89.2k members
Civil engineering: Building and maintaining infrastructure.
Visit
Subreddit Icon
r/ComputerEngineering
24.5k members
Welcome to r/ComputerEngineering - A community for discussing computer engineering and its related areas (electrical engineering and computer science)!
Visit
Subreddit Icon
r/ChemicalEngineering
76.2k members
All things Chemical Engineering!
Visit
r/AskEngineers
287k members
Engineers apply the knowledge of math & science to design and manufacture maintainable systems used to solve specific problems. AskEngineers is a forum for questions about the technologies, standards, and processes used to design & build these systems, as well as for questions about the engineering profession and its many disciplines.
Visit
r/ReverseEngineering
124k members
A moderated community dedicated to all things reverse engineering.
Visit
r/industrialengineering
10.4k members
A place for Industrial Engineers to talk about efficiency, 6σ and more.
Visit
Subreddit Icon
r/AerospaceEngineering
54.5k members
r/AerospaceEngineering is a forum for professionals, enthusiasts, and hobbyists to share knowledge, experience, and learn about aerospace topics. Read the rules before posting!
Visit
Subreddit Icon
r/cscareerquestions
936k members
A subreddit for those with questions about working in the tech industry or in a computer-science-related job.
Visit
Subreddit Icon
r/programming
4.8m members
Computer Programming
Visit
r/BigDataJobs
9.8k members
Find and post work relating to "Big Data".
Visit
Subreddit Icon
r/datascience
816k members
A place for data science practitioners and professionals to discuss and debate data science career questions.
Visit
r/bigdata
47.1k members
Welcome to r/bigdata
Visit
Subreddit Icon
r/cscareerquestionsEU
83.9k members
A subreddit for those with questions about computer science & IT careers within Europe
Visit
r/csMajors
149k members
All about studying and students of computer science.
Visit
Subreddit Icon
r/developersIndia
101k members
A wholesome community made by & for software & tech folks in India. Have a doubt? Ask it out.
Visit
Subreddit Icon
r/analytics
126k members
Dedicated to web analytics, data and business analytics. We're here to discuss analysis of data, learning of skills and implementation of web analytics.
Visit
Subreddit Icon
r/berkeley
121k members
A subreddit for the community of UC Berkeley as well as the surrounding City of Berkeley, California. Welcome!
Visit
r/BusinessIntelligence
116k members
Business Intelligence is the process of utilizing organizational data, technology, analytics, and the knowledge of subject matter experts to create data-driven decisions via dashboards, reports, alerts, and ad-hoc analysis. This is not a generic 'business' subreddit and off topic posts will be marked as spam. Related Subreddits: /r/ETL /r/Database /r/DataScience /r/Datasets /r/DataIsBeautiful /r/Cognos /r/Microstrategy /r/PowerBI /r/Tableau /r/Qlik /r/Visualization
Visit
r/ITCareerQuestions
322k members
This subreddit is designed to help anyone in or interested in the IT field to ask career-related questions.
Visit
r/learnmachinelearning
276k members
A subreddit dedicated to learning machine learning
Visit
Subreddit Icon
r/EliteDangerous
326k members
The official unofficial subreddit for Elite Dangerous, we even have devs lurking the sub! Elite Dangerous is a space simulator game by Frontier Developments based in the year 3308. Your CMDR flies spaceships and participates in exploring a 1:1 scale Milky Way, trade between star systems, bounty-hunting wanted ships, discover alien lifeforms, and even pirate other ships for their cargo.
Visit
Subreddit Icon
r/resumes
694k members
A place for people to give and receive resume-related advice.
Visit
r/learnprogramming
3.5m members
A subreddit for all questions related to programming in any language.
Visit
Subreddit Icon
r/ProgrammerHumor
2.5m members
Dedicated to humor and jokes relating to programmers and programming.
Visit
Subreddit Icon
r/googlecloud
34.0k members
The goto subreddit for Google Cloud Platform developers and enthusiasts.
Visit
r/dataanalysis
38.1k members
A subreddit for those doing data analysis. Share ideas, tips, & resources. Ask questions & get help or suggestions for working with tricky datasets. Post your results and how you got there.
Visit
Subreddit Icon
r/india
1.2m members
The Official Subreddit for India
Visit
73
Subreddit Icon
Posted by8 days ago
73
101 comments
361
Subreddit Icon
Posted by5 days ago

Hello everyone,

Some of my posts about DE projects (for portfolio) were well received in this subreddit. (e.g. this and this)

But many readers reached out with difficulties in setting up the infrastructure, CI/CD, automated testing, and database changes. With that in mind, I wrote this article https://www.startdataengineering.com/post/data-engineering-projects-with-free-template/ which sets up an Airflow + Postgres + Metabase stack and can also set up AWS infra to run them, with the following tools

  1. local development: Docker & Docker compose

  2. DB Migrations: yoyo-migrations

  3. IAC: Terraform

  4. CI/CD: Github Actions

  5. Testing: Pytest

  6. Formatting: isort & black

  7. Lint check: flake8

  8. Type check: mypy

I also updated the below projects from my website to use these tools for easier setup.

  1. DE Project Batch edition Airflow, Redshift, EMR, S3, Metabase

  2. DE Project to impress Hiring Manager Cron, Postgres, Metabase

  3. End-to-end DE project Dagster, dbt, Postgres, Metabase

An easy-to-use template helps people start building data engineering projects (for portfolio) & providing a good understanding of commonly used development practices. Any feedback is appreciated. I hope this helps someone :)

Tl; DR: Data infra is complex; use this template for your portfolio data projects

Blog: https://www.startdataengineering.com/post/data-engineering-projects-with-free-template/ Code: https://github.com/josephmachado/data_engineering_project_template

361
35 comments
54
Subreddit Icon
Posted by3 days ago

I'm a server side/backend developer with an year of experience in developing web services with Java and Spring. For the last 3 months, I've been using Apache Beam's Java SDK for writing ETL pipelines for GCP Dataflow.

My work role now requires me to make a permanent switch to DE, and complete the Databricks Certified Associate Developer for Apache Spark certification.

The Catch is that the cert is only offered for either python or Scala but I'm currently only proficient with Java. I'm willing to learn either one of these languages but I'm confused about which one to go forward with. My dilemma is as follows:

  1. I dislike python but I feel like it's the `lingua franca' of DE and it's not really possible to make a career in this field without mastering it.

  2. I love Scala but feel that it might not be too useful other than this cert for DE.

  3. At my workplace we use all 3 but Scala is rarest.

Any input would be appreciated, Thanks.

54
35 comments
43
114
Subreddit Icon
Posted by14 hours ago

Fundamentals of Data Engineering has received a lot of good reviews in the data world, and I’ve been meaning to read it.Turns out lots of folks have been meaning to read it too, so I put together a book club!

I'm not being paid to promote this book in any way, shape, or form. I have no business affiliation with the authors. I just want to read it in a book club.Here’s a brief overview of the book club:- All participants read the book independently, and then we meet bi-weekly for 30 mins to discuss key takeaways, questions, hot takes, etc.Schedule:

We currently have 20+ people participating!If this sounds interesting to you, find additional details and schedule here.

114
35 comments
214
Subreddit Icon
Posted by24 days ago

I am seeing quite a bit of post on here with people asking for advice on their resumes or expressing their difficulties in landing a Data Engineering position or even getting an interview at all.

I want to share some tips that helped me as someone who is also fairly new to the Data Engineering world. My previous work experience ranged from doing basic SQL/Excel stuff for the first two years of my career and then advancing into more of a Database Developer position for another two years. Once of the things that was discouraging me to most is that I had a pretty bad 3 year gap in employment for anything tech related, I was laid off in 2019 and decided to finish a degree I was working on in Data Science which seemed to be the hottest job title back in 2018-2019. Not saying it isn't hot anymore but it seems like Data Engineering is an even hotter and in demand title at the moment. I got caught up with collecting unemployment , working odd jobs off the books, door dashing etc for a few years but this past summer I decided it was time to get back in the tech world before it was too late.

So in July I made it my mission to get a job by the end of the summer so I could start when my two kids went back to school.

First off the only platform I used to apply for jobs was LinkedIN. It seemed like the most professional and provided the most information about the types of jobs I was applying for. I never used LinkedIn Premium before but because there was a 1 month free trial I went and enabled that too which then started to give you better insights on which jobs you would be a top candidate for.

Here are the main tips I have for navigating the application process on LinkedIN :

  1. Get your easy apply set up so you can one click apply or click your way through the simple questions they have on the job postings

  2. Skill Assessments : This is a big one. Anytime I applied for a job on LinkedIN it would offer for me to take a skill assessment test where you would receive a badge if you scored over a certain percentage. Many of them also came with videos and mini courses to take to help you pass them. Any skill assessment I felt I had a shot at passing I would take, and even if you fail the first time it lets you retry it twice I believe and you get an idea of how the questions are. So I took every single one I could (SQL, Excel, Python , Azure, PowerBI, R, a few more I can't remember off the top of my head. The only ones I didn't take were the ones I had no chance of passing because I had no prior experience with such as Java Development, JavaScript libraries I've never used, C++. But because all the jobs I was applying for were data related I didn't encounter those to much.

I know for a fact this helped considerably because when I would receive responses from the recruiters it would show a copy of what my application looked like on their end and it would say so and so has 3/3 of the required skills. So they know you aren't just making up stuff on your resume you are showing them that you know at least the basics.

3) Become familiar with a cloud platform. This was the biggest change I have seen since 2019 is the massive shift towards cloud based platforms, software as a service and all that good stuff. I went with Azure just because my previous experience was mostly in Microsoft SQL Server. Azure offers a 1 month free trial and with it you get $200 credit. I signed up for this and began using a bunch of different tools related to data engineering, mostly Azure Data Factory but I also spinned up an Azure SQL Server Instance, an Azure Cosmos Database which is free for a year and a Linux VM. Made some basic pipelines in Data Factory to ETL data from one source to another and learned quite a bit about using the command line interfaces PowerShell and Bash. When people asked me about this stuff in interviews I was able to answer basic questions about it and it showed them I was interested in learning new things on my own.

4) Check the number of applicants for the LinkedIN Jobs. Alot of people only spam the easy apply button, which of course I did too. But for many of the jobs that don't offer this option and require a full application on their HR platform I noticed the number of applicants were much lower. The job I currently have now was one of the ones I couldn't easy apply for and I saw it only had a 20 or so applicants as opposed to the hundreds you would see for the easy apply ones. Don't skip over these jobs just because the application process will take a few more minutes especially if it's one you really want.

5) Apply for any job you feel you can do. The job says 5 years of experience required but you only have 3 ? Apply anyway. Shoot as high as you can. Those are the ideal candidate requirements they are looking for but they know they aren't always going to find someone with that much experience. Don't get discouraged if you don't meet the minimum requirements. This is a numbers game and you want to apply for as many jobs as you possibly can especially since things have gone remote for a large part.

6) During the interviews if you don't know something just be honest with them. Don't try to BS your way around it because this will only make you look worse if they find out you are being dishonest. Showing that you are willing to learn new things looks alot better than getting caught in a lie.

7) Ask for criticism on your resume. This sub reddit seems to be a great source for that as well as the SQL one, Data Science, or just programming subs in general. Ask someone to do a mock interview too if you can.

8) Most important one - Don't give up! There are so many jobs available out there. Don't get discouraged. Keep learning new things. Learn from your previous mistakes. You will get one eventually.

I started applying for jobs in mid July and I got so many offers for interviews that I had to start rejecting them. Out of the 10 interviews I did , 8 of them made it to the second round, 5 to the third. The one I ended up getting involved 5 separate interviews actually. And the day I started that job I got an offer for one of the other ones I had applied for that actually paid slightly more but it was a contract/hourly job for 9 months and it didn't seem as interesting as the one I already started so I had to turn it down. A few of the other ones I made it to the third round either went with a different candidate or just ghosted me which seems very common these days. Don't take it to you personally if this happens to you some people just don't have the common courtesy to get back to you.

Anyway I hope this is helpful or encouraging to anyone who reads it and if you have any questions for me please feel free to ask.

214
53 comments
57