For anything funny related to programming and software development.
Programming
For anything funny related to programming and software development.
For anything funny related to programming and software development.
For anything funny related to programming and software development.
For anything funny related to programming and software development.
For anything funny related to programming and software development.
The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!
Publica tus proyectos, dudas o busca inspiracion para acercarte a cualquier lenguaje de programacion! 🚀🚀🚀🚀🚀🚀 FAQ: https://devs-arg.github.io/faq
Veamos algo en principio curioso:
overojuancho >touch zero.txt overojuancho >echo -n "a" > one.txt overojuancho >echo -n "ñ" > two.txt overojuancho >ls -ltr --time=birth total 8 -rw-rw-r-- 1 juancho juancho 0 abr 20 14:49 zero.txt -rw-rw-r-- 1 juancho juancho 1 abr 20 14:49 one.txt -rw-rw-r-- 1 juancho juancho 2 abr 20 14:49 two.txt overojuancho >
El archivo vacío tiene 0 bytes, el archivo con la letra "a" creado sin caracter de fin de linea (opción -n de ) tiene 1 byte y el archivo con la ñ tiene ¿2 bytes?
¿que esta pasando aquí?
Si le pregunto al comando información sobre el archivo, me dice lo siguiente:
overojuancho >file two.txt two.txt: Unicode text, UTF-8 text, with no line terminators
Bueno eso explica las cosas. UTF-8 es la codificación de caracteres por defecto que tengo en este linux. Pero para llegar acá, antes un viaje en el tiempo
A principio de los años 60 se diseño en Estados Unidos un sistema para codificar las letras mayúsculas del abecedario ingles, los más números, los más signos de puntuación, y algunos más caracteres de control. En una revisión posterior se agregaron las letras minúsculas. Este estándar se denominó ASCII, por las iniciales de American Standard Code for Information Interchange.
ASCII tiene cosas bastante interesantes. Usa 7 bits y divide una porción para grupo y otra para el caracter dentro del grupo.
Un byte quedaría así:
| 0 | x | x | x | x | x | x | x |
El bit más significativo es siempre 0
Los siguientes dos más significativos pueden ser
-
0 0 => | 0 | 0 | 0 | x | x | x | x | x | => caracter de control
-
0 1 => | 0 | 0 | 1 | x | x | x | x | x | => signos de puntuacion y números
-
1 0 => | 0 | 1 | 0 | x | x | x | x | x | => letras mayúsculas (en mayor parte)
-
1 1 => | 0 | 1 | 1 | x | x | x | x | x | => letras minúsculas (en mayor parte)
Como una letra mayúscula difiere de una minúscula solo en un bit, con solo manipularlo se puede alternar entre minúscula y mayúscula. Un ejemplo en python:
>>> chr(ord('a') ^ (1 << 5)) 'A' >>> chr(ord('A') ^ (1 << 5)) 'a' >>>
En C es mucho mas directo, ya que una variable tipo char es solo un entero de 8 bits:
c ^ (1 << 5)
donde c es la variable donde almaceno el caracter
Más adelante y con la necesidad de incorporar caracteres en otros idiomas se desarrolló el estándar el famoso latin-1. Pero no voy a discutirlo acá porque me voy mucho por las ramas.
Con el tiempo se termino desarrollando un estándar para resolver todas las limitaciones de los sistemas anteriores, y aunque haga referencia a él, tampoco voy a explayarme hablando de si no de la forma de representar los codepoints de unicode en un stream de bytes
utf-8
Hablando de ramas, y para volver al tema de por que la ñ ocupa dos bytes, a se les ocurrió durante un sistema muy ingenioso.
UTF-8 puede codificar todos los caracteres de unicode y lo hace de una manera muy practica.
-
Puede usar de 1 a 4 bytes según haga falta.
-
caracteres de ascii mapean directamente en uft-8 de un byte lo que lo hace backward compatible con éste
-
preserva el orden
-
es auto-sincronizable
entre otras características
¿como codifica todos los caracteres de unicode?
-
Representa codepoints en el rango de 0x0000 al 0x10FFFF
-
Si el primer byte tiene un 0 en el bit más significativo, entonces usa 1 bytes y mapea directamente a ascii. Representa los code points en el rango de 0x0 a 0x7F
-
Si el primer byte comienza por la secuencia [1|1|0] usa 2 bytes y el segundo byte tiene que comenzar con la secuencia [1|0] quedando disponibles 11 bits que representan codepoints en el rango de 0x80 a 0x7ff
-
Si el primer byte comienza por la secuencia [1|1|1|0] usa 3 bytes y los dos siguientes comienzan por [1|0] quedando disponibles 16 bits que representan codepoints en el rango de de 0x800 a 0xffff
-
Si el primer byte comienza por [1|1|1|1|0] usa 4 bytes y los tres siguientes deben comenzar por [1|0] quedando disponibles 21 bits para representar codepoints en el rango de 0x10000 a 0x10FFFF
El que define utf-8 tiene esta linda tabla que sumariza lo anterior:
Char. number range | UTF-8 octet sequence |
---|---|
0000 0000-0000 007F | 0xxxxxxx |
0000 0080-0000 07FF | 110xxxxx 10xxxxxx |
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
Además se exluye el rango de 0xD800 and 0xDFFF
¿y entonces la ñ?
Como la ñ no esta en el rango de ascii, entonces no mapea a 1 bytes utf-8 sino en 2 bytes. Con python lo podemos ver de la siguiente manera:
>>> letra_bytes = 'ñ'.encode('utf-8') >>> letra_bytes b'\xc3\xb1' >>> len(letra_bytes) 2 >>> lst = [bin(x) for x in letra_bytes] >>> lst ['0b11000011', '0b10110001']
Podemos ver que la ñ en utf-8 se codifican como dos bytes, donde el primero comienza por [1|1|0] indicando que usa 2 bytes y el segundo comienza por [1|0]
Si extraemos la cabecera de los dos bytes y unimos para formar el codepoint tenemos:
>>> codepoint_str= lst[0].removeprefix('0b110') + lst[1].removeprefix('0b10') >>> int(codepoint_str,2) 241 >>> chr(241) 'ñ'
Manipular la representación en string de un numero binario es ilustrativo pero no muy practico. Una funcion simple para decodificar un stream de bytes en python podria ser la siguiente:
def decode(raw: bytes) -> str: codepoints = list() longitude = len(raw) index = 0 while index < longitude: if raw[index] < 0x80: # check for UTF-8 1 byte encoding code_point = raw[index] index += 1 elif (raw[index] & 0xE0) == 0xC0: # check for UTF-8 2 byte encoding # starting with 0b110xxxxx first_byte = (raw[index] & 0x1F) << 6 # remove head and swift 6 # places to the left to # make room for OR operation second_byte = raw[index + 1] & 0x3F code_point = first_byte | second_byte index += 2 elif (raw[index] & 0xF0) == 0xE0: # check for UTF-8 2 byte # encoding starting with 0b1110xxxx first_byte = (raw[index] & 0x0F) << 12 second_byte = (raw[index + 1] & 0x3F) << 6 third_byte = raw[index + 2] & 0x3F code_point = first_byte | second_byte | third_byte index += 3 elif (raw[index] & 0xF8) == 0xF0 and (raw[index] <= 0xF4): # Checks # if it is a 4-byte UTF-8 encoding scheme that starts with # 0b11110xxx. It also checks that it's within # the valid Unicode range, which goes up to 0x10FFFF, # because the maximum valid 4-byte UTF-8 sequence # is: 0xF4 0x8F 0xBF 0xBF first_byte = (raw[index] & 0x07) << 18 second_byte = (raw[index + 1] & 0x3F) << 12 third_byte = (raw[index + 2] & 0x3F) << 6 fourth_byte = raw[index + 3] & 0x3F code_point = first_byte | second_byte | third_byte | fourth_byte index += 4 else: code_point = 0xFFFD # Unicode replacement character index += 1 if 0xD800 <= code_point <= 0xDFFF: msg = f"0x{code_point:x} is an invalid UTF-8 character (surrogate pair UTF-16)" raise ValueError(msg) codepoints.append(code_point) return "".join(chr(c) for c in codepoints) if __name__ == "__main__": raw_bytes = b"\x49\xc2\xbb\xe2\x98\xba\xf0\x9f\xa9\xb7" decoded: str = decode(raw_bytes) print(decoded) raise SystemExit(0)
Notar que no controla demasiados errores y si los bytes de entrada están mutilados, puedo obtener una excepción del tipo IndexError. Además cuando hago operaciones binarias uso literales en hexadecimal en lugar de binario, por ejemplo 0xF8 en lugar de 0b11111000. La idea es que sea un ejemplo
Algo mas interesante y que no usa "if", se discute
Perdón si quedo un poco largo y confuso, leo más de lo que escribo por lo que me falta "musculo" para la redacción. Solo quería compartirles cosas que voy aprendiendo a medida que algo me llama la atención.
fuentes:
-
ascii:
-
uft-8:
-
rfc 3629:
-
unicode:
-
branchless UTF-8 decoder:
For anything funny related to programming and software development.
The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!
The subreddit covers various game development aspects, including programming, design, writing, art, game jams, postmortems, and marketing. It serves as a hub for game creators to discuss and share their insights, experiences, and expertise in the industry.
*english, non-vr games only. ~97k of those on steam right now.
Common questions:
Why? I wanted to see how many "hidden gems" there might be.
How long did it take? Ballpark 2-3 years, not much time a day on average.
Would I recommend doing this? Not for a single person on this scale, but, for genre-dedicated communities, yes.
What does my storefront look like? I leave a small selection of games from the last 3 months up to gather reviews.
A wholesome community made by & for software & tech folks in India. Have a doubt? Ask it out.
Currently, working as a SDE-1 in a product based company. Should I join google SDE-1 for its tag?
Current cash comp > offered cash
Offered Tc > current Tc (because of stocks)
Pros:
-
Google/FAANG tag
-
Stocks 💲
Cons:
-
Will have to move to Bengaluru (and learn Kannada :p), have wfh in my current company.
-
Probably it will take 1.5-2 years for promotion
Sensibly, it’s not worth enough to move to blr for a similar tc and more expenses but Google tag is very appealing to me.
The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!
For anything funny related to programming and software development.
For anything funny related to programming and software development.
Whenever I see a comment saying this, immediately know you’re lying. There is no way you have that well of a grasp on DSA with only 50 questions solved. You either studied a ton outside of leetcode, or practiced a ton on other platforms. I’m sick of seeing people lie about this to make everyone think they’re a genius. It only makes others think they are practicing wrong or are not smart enough. Thanks for reading my rant.
A subreddit for News, Help, Resources, and Conversation regarding Unity, The Game Engine.
For anything funny related to programming and software development.
For experienced developers. This community should be specialized subreddit facilitating discussion amongst individuals who have gained some ground in the software engineering world. Any posts or comments that are made by inexperienced individuals (outside of the weekly Ask thread) should be reported. Anything not specifically related to development or career advice that is _specific_ to Experienced Developers belongs elsewhere. Try /r/work, /r/AskHR, /r/careerguidance, or /r/OfficePolitics.
Is communicating with "the business" or higher-ups challenging or is it just me? what is most challenging about it for you?
I think the hardest things for me are (1) I'm finding that candid feedback isn't really valued. (2) It seems like the most successful folks end up bluffing on some questions especially if the higher-up is digging into an unnecessary level of detail.
The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!
-
A community dedicated to all things web development: both front-end and back-end. For more design-related questions, try /r/web_design.
members -
Subreddit for posting questions and asking for general advice about your python code.
members -
The subreddit covers various game development aspects, including programming, design, writing, art, game jams, postmortems, and marketing. It serves as a hub for game creators to discuss and share their insights, experiences, and expertise in the industry.
members -
A wholesome community made by & for software & tech folks in India. Have a doubt? Ask it out.
members -
For experienced developers. This community should be specialized subreddit facilitating discussion amongst individuals who have gained some ground in the software engineering world. Any posts or comments that are made by inexperienced individuals (outside of the weekly Ask thread) should be reported. Anything not specifically related to development or career advice that is _specific_ to Experienced Developers belongs elsewhere. Try /r/work, /r/AskHR, /r/careerguidance, or /r/OfficePolitics.
members -
A community for discussing anything related to the React UI framework and its ecosystem. Join the Reactiflux Discord (reactiflux.com) for additional React discussion and help.
members -
The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!
members -
A place for all things related to the Rust programming language—an open-source systems language that emphasizes performance, reliability, and productivity.
members -
The official Python community for Reddit! Stay up to date with the latest news, packages, and meta information relating to the Python programming language. --- If you have questions or are new to Python use r/LearnPython
members -
Ask questions and post articles about the Go programming language and related tools, events etc.
members -
PowerShell is a cross-platform (Windows, Linux, and macOS) automation tool and configuration framework optimized for dealing with structured data (e.g. JSON, CSV, XML, etc.), REST APIs, and object models. PowerShell includes a command-line shell, object-oriented scripting language, and a set of tools for executing scripts/cmdlets and managing modules.
members -
A subreddit for News, Help, Resources, and Conversation regarding Unity, The Game Engine.
members -
.NET Community, if you are using C#, VB.NET, F#, or anything running with .NET... you are at the right place!
members -
members -
members -
[Docker](http://www.docker.io) is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.
members -
The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions.
members -
This subreddit has gone Restricted and reference-only as part of a mass protest against Reddit's recent API changes, which break third-party apps and moderation tools. For immediate help and problem solving, please join us at https://discourse.practicalzfs.com with the ZFS community as well.
members -
Discussions, articles and news about the C++ programming language or programming in C++.
members -
This sub is dedicated to discussion and questions about Programmable Logic Controllers (PLCs): "an industrial digital computer that has been ruggedized and adapted for the control of manufacturing processes, such as assembly lines, robotic devices, or any activity that requires high reliability, ease of programming, and process fault diagnosis."
members