Today at work, I heard one guy say something along the lines of "Yea we can move the data to trix and then slurpy it to plex" and I just had to hold in my laugh for a few minutes. Who the fuck comes up with this terminology ahahahaha

259

55 comments

Posted by13 hours ago

Career

What is the McKinsey of Data Science Consulting?

I am living in Berlin, Germany, and looking for a Job in Data Science Consulting. During my studies I have gathered a total of 6 years of data-related work experience, habe decent grades and am now looking for a fast-paced job in data (science/engineering) consulting. I am willing to put in long hours and can learn quickly. Therefore looking for a consultancy that does top-notch projects for rather large companies. Does anyone know any suiting companies?

30 comments

Crossposted by17 hours ago

Projects

[N] Mozilla launched a responsible AI challenge and I'm stoked about it

/r/Mach...

•

Posted by2 days ago

News

[N] Mozilla launched a responsible AI challenge and I'm stoked about it

who's applying and what are you planning to build??? https://www.axios.com/2023/03/15/mozilla-responsible-ai-challenge

30 points

8 comments

0 comments

Vote

Posted by1 hour ago

Projects

Do I have to pay now for the Twitter API if I want to use it for data analysis?

I want to work on research projects where I use Twitter data for text mining. However, I don't know the exact new regulations about the new Twitter API since Elon Musk bought Twitter and changed the API specifics. I see that you can apply for academic access but this seems quite hard..does anyone know if I can still get access to the Twitter API without paying?

Vote

1 comment

Posted by11 hours ago

Discussion

Introducing Microsoft 365 Copilot | Your Copilot for Work

blogs.microsoft.com/blog/2...

11 comments

Posted by18 hours ago

Discussion

What is your digital workspace, tools, setup, etc. for ETL, research, production?

I'm new to this and so I've been wanting to know what other people have been using to make their work feel as smooth as butter. Since I've been learning lots and not just the industry standard stuff, I wanted to share what little I found to be valuable which others may want to try. The main goal of this post is to share, critique, and provide suggestions so that we can all find the setup we like most. I am also looking for new, up and coming tech, and definitely not afraid to try new things!

IDE: VSCode with the Jupyter Notebook Extension. What I like about it is that I can view data structures like series/dataframes in a table format by clicking the variable in the Jupyter: Variables pane at the bottom. I started with plain vanilla jupyter notebooks from Anaconda so this was pretty nice. I have seen demos that Jupyter Lab has something like this, so if anyone has used both VSCode's notebooks and used Lab, your input would be appreciated. I hear good things about PyCharm and Spyder. Some people also use Google Collab, DataSpell, and DeepNote but I don't know enough about it. I did play around with DeepNote, and it was very cool but I didn't feel compelled to switch (and you have to pay for it!).

Tools:

A code helper: A few months back I was googling everything and I would've listed Stackoverflow. I might actually use that occasionally, but these days I use ChatGPT and Bing AI. For more current info or news-based I'll use Bing AI since it uses live search results, and for information that is knowledge based I might use ChatGPT. ChatGPT saves conversations so it's great for exploring topics in depth and referencing that conversation later. For those who have used both, maybe you know what I'm talking about and can provide a better explanation as to which is better for what purpose.
Software: Excel is an obvious one. For instance, if I have a huge dataset and I just want to delete out columns that I don't need with Ctrl+click to select, it's easier and quicker than copy + pasting or typing out each of the string column names I want to "df.drop()". Excel is great for quick and simple stuff. Some software I have been learning about are I guess what I would consider as no- or low-code data analytics platforms, such as Alteryx, KNIME, and Orange. These software let you practically run an entire ETL pipeline. I believe Alteryx and KNIME are the gold-standard in this category, and Orange is a "lite" version of the two and is available in Anaconda. I think these are pretty cool, and I personally haven't found a huge use case for them since I've been chugging away in my notebooks with Python, but I can see the value. Would love for someone to chime in on these tools and how they compare to manually doing stuff in code, especially for large datasets.
Version Control: This is where I'm primarily lacking, but I know that Github is the go-to. I don't use this but I know that a ton of people do. I don't even know where to start to be honest. I usually just create a new .ipynb file for each analysis or phase of an ETL pipeline haha. I'm also not too aware of what other innovative tools for version control exist.
Python Libraries: Besides the obvious stuff like Pandas/NumPy, MatplotLib/Seaborn, and your popular ML libraries, I've recently found out about this library called Polars. It's basically a Rust version of Pandas, and it's super powerful. Some operations that I've run, that would've taken hours with Pandas, took me minutes. But I've been hearing that Pandas 2.0 which will be released some time this month, has been looking at using PyArrow dtypes (if I recall correctly) and the speed is comparable to Polars. I mean these two are FAST. Another contender is DuckDB but I think the new Pandas and Polars are still faster. I mostly use Pandas but if there is some heavy lifting, I'll swap the dataframe to a polars one with a quick function, run it with polars, then back to pandas.

Anyway, that's just some things I can immediately think of. Looking forward to your suggestions! Bonus points for anything new and innovative. Cheers.

https://preview.redd.it/qj2cywt1r4oa1.png?width=1920&format=png&auto=webp&v=enabled&s=01626413e867c03a18d40309ea3a7fdd16c4064a

13 comments