Skip to main content

Get the Reddit app

Scan this QR code to download the app now
Or check it out in the app stores

r/Database

members
online

I was looking at databases..and, how much faster is a MariaDB or MySQL vs for ex 500 xls files that are search with a python script? I mean if I have to get rows 50-100 in a file with a specific name, how much faster does a MariaDB/MySQL do that than a python script that does it? I was looking at databases..and, how much faster is a MariaDB or MySQL vs for ex 500 xls files that are search with a python script? I mean if I have to get rows 50-100 in a file with a specific name, how much faster does a MariaDB/MySQL do that than a python script that does it?

how much faster are different databases than other methods?


Hey Reddit, there are r/nostupidquestions, but there are stupid trades. Drop your favorite stonk in the comment section and we'll show you how to trade them whether you're bullish, bearish or neutral, we've got an options strategy you can use. To keep you from looking stupid.



Best database/system architecture for a large number of real-time high rate IOT devices Best database/system architecture for a large number of real-time high rate IOT devices

What would be the best database or architecture of database + data engine for the following requirements:

  • I will receive IOT sensor data that are almost always ordered in terms of timestamp of the signal itself. The exception being if there's network delay between different packets, so they will have to be inserted between the last 1-3 signals most likely.

  • If no changes occur, latest row will be updated instead of doing a new insert.

  • A signal is received from each IOT sensor every 1-5 seconds with a total of ~100,000 devices.

  • Each IOT sensor signal is between 200 B - 2000 B, with an average size of around 500B.

  • Each IOT sensor returns a well-defined set of fields. There are around 200 possible types of fields returned by the IOT sensor, but although it can send all 200 fields, each IOT sensor usually sends a set of 20-40 fields out of those 200 depending on user configuration. So, in terms of storage, most columns for each row will be NULL.

  • A smaller subset of around 10 fields will always be provided by all IOT sensors

  • Availability matters more than consistency since most data will be consumed by the client for real-time monitoring and not through the database. Historical reports queries from the database will mostly be well into the past so eventual consistency is enough.

  • Old data (let's say 3+ months) is deleted.

  • I need partition tolerance with distributed writes on all nodes.

Conclusions until now:

  • Partition tolerance needed

  • Availability > consistency

  • NULL fields should take no space

  • Row oriented storage on disk, so that write speed is high

Based on these, I am considering Cassandra/ScyllaDB with partitions based on device ID and day like 01_01_2024.

However, I have some concerns with regards to query patterns:

Users can retrieve a list of complete signals over a time range, like a 1-week period, for a specific device.

  • Parallel queries across the different partitions solves this.

  • This query would be the most used by users to get a history playback of sensor data.

Basic aggregation/conditionals over a specific device like a displaying graph of a certain field with 1-hour averages as 1 data point over the span of 1 month or more.

  • Could use parallel queries also but it's less performant since we have to go over so many more partitions and blocks in disk unnecessarily as we only need 1 field at a time for each query, so something like a column-oriented database solves this... but not Cassandra/ScyllaDB.

Basic aggregation/conditionals over all or many devices over a specific field like points where a custom limit was exceeded over a time range of 1 month.

  • Benefits from a column-oriented database as well.

What is the best database(s)/architecture for this or maybe it is all solvable with Cassandra/ScyllaDB?