Skip to main content

Get the Reddit app

Scan this QR code to download the app now
Or check it out in the app stores

Programming

vimAllTheWay
r/ProgrammerHumor

For anything funny related to programming and software development.


Members Online
vimAllTheWay
r/ProgrammerHumor - vimAllTheWay









Saw a dialog box like this in a dream after drinking whiskey and blacking out
r/godot

The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!


Members Online
Saw a dialog box like this in a dream after drinking whiskey and blacking out
r/godot - Saw a dialog box like this in a dream after drinking whiskey and blacking out

la ñ ocupa dos bytes
r/devsarg

Publica tus proyectos, dudas o busca inspiracion para acercarte a cualquier lenguaje de programacion! 🚀🚀🚀🚀🚀🚀 FAQ: https://devs-arg.github.io/faq


Members Online
la ñ ocupa dos bytes

Veamos algo en principio curioso:

overojuancho >touch zero.txt
overojuancho >echo -n "a" > one.txt
overojuancho >echo -n "ñ" > two.txt
overojuancho >ls -ltr --time=birth
total 8
-rw-rw-r-- 1 juancho juancho 0 abr 20 14:49 zero.txt
-rw-rw-r-- 1 juancho juancho 1 abr 20 14:49 one.txt
-rw-rw-r-- 1 juancho juancho 2 abr 20 14:49 two.txt
overojuancho >

El archivo vacío tiene 0 bytes, el archivo con la letra "a" creado sin caracter de fin de linea (opción -n de echo) tiene 1 byte y el archivo con la ñ tiene ¿2 bytes?

¿que esta pasando aquí?

Si le pregunto al comando file información sobre el archivo, me dice lo siguiente:

overojuancho >file two.txt 
two.txt: Unicode text, UTF-8 text, with no line terminators

Bueno eso explica las cosas. UTF-8 es la codificación de caracteres por defecto que tengo en este linux. Pero para llegar acá, antes un viaje en el tiempo

A principio de los años 60 se diseño en Estados Unidos un sistema para codificar las letras mayúsculas del abecedario ingles, los más números, los más signos de puntuación, y algunos más caracteres de control. En una revisión posterior se agregaron las letras minúsculas. Este estándar se denominó ASCII, por las iniciales de American Standard Code for Information Interchange.

ASCII tiene cosas bastante interesantes. Usa 7 bits y divide una porción para grupo y otra para el caracter dentro del grupo.

Un byte quedaría así:

| 0 | x | x | x | x | x | x | x |

El bit más significativo es siempre 0

Los siguientes dos más significativos pueden ser

  • 0 0 => | 0 | 0 | 0 | x | x | x | x | x | => caracter de control

  • 0 1 => | 0 | 0 | 1 | x | x | x | x | x | => signos de puntuacion y números

  • 1 0 => | 0 | 1 | 0 | x | x | x | x | x | => letras mayúsculas (en mayor parte)

  • 1 1 => | 0 | 1 | 1 | x | x | x | x | x | => letras minúsculas (en mayor parte)

Como una letra mayúscula difiere de una minúscula solo en un bit, con solo manipularlo se puede alternar entre minúscula y mayúscula. Un ejemplo en python:

>>> chr(ord('a') ^ (1 << 5))
'A'
>>> chr(ord('A') ^ (1 << 5))
'a'
>>> 

En C es mucho mas directo, ya que una variable tipo char es solo un entero de 8 bits:

c ^ (1 << 5) donde c es la variable donde almaceno el caracter

Más adelante y con la necesidad de incorporar caracteres en otros idiomas se desarrolló el estándar ISO/IEC 8859-1 el famoso latin-1. Pero no voy a discutirlo acá porque me voy mucho por las ramas.

Con el tiempo se termino desarrollando un estándar para resolver todas las limitaciones de los sistemas anteriores, y aunque haga referencia a él, tampoco voy a explayarme hablando de unicode si no de la forma de representar los codepoints de unicode en un stream de bytes

utf-8

Hablando de ramas, y para volver al tema de por que la ñ ocupa dos bytes, a dos muchachos se les ocurrió durante la cena un sistema muy ingenioso.

UTF-8 puede codificar todos los caracteres de unicode y lo hace de una manera muy practica.

  • Puede usar de 1 a 4 bytes según haga falta.

  • caracteres de ascii mapean directamente en uft-8 de un byte lo que lo hace backward compatible con éste

  • preserva el orden

  • es auto-sincronizable

entre otras características

¿como codifica todos los caracteres de unicode?

  • Representa codepoints en el rango de 0x0000 al 0x10FFFF

  • Si el primer byte tiene un 0 en el bit más significativo, entonces usa 1 bytes y mapea directamente a ascii. Representa los code points en el rango de 0x0 a 0x7F

  • Si el primer byte comienza por la secuencia [1|1|0] usa 2 bytes y el segundo byte tiene que comenzar con la secuencia [1|0] quedando disponibles 11 bits que representan codepoints en el rango de 0x80 a 0x7ff

  • Si el primer byte comienza por la secuencia [1|1|1|0] usa 3 bytes y los dos siguientes comienzan por [1|0] quedando disponibles 16 bits que representan codepoints en el rango de de 0x800 a 0xffff

  • Si el primer byte comienza por [1|1|1|1|0] usa 4 bytes y los tres siguientes deben comenzar por [1|0] quedando disponibles 21 bits para representar codepoints en el rango de 0x10000 a 0x10FFFF

El RFC 3629 que define utf-8 tiene esta linda tabla que sumariza lo anterior:

Char. number range UTF-8 octet sequence
0000 0000-0000 007F 0xxxxxxx
0000 0080-0000 07FF 110xxxxx 10xxxxxx
0000 0800-0000 FFFF 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Además se exluye el rango de 0xD800 and 0xDFFF

¿y entonces la ñ?

Como la ñ no esta en el rango de ascii, entonces no mapea a 1 bytes utf-8 sino en 2 bytes. Con python lo podemos ver de la siguiente manera:

>>> letra_bytes = 'ñ'.encode('utf-8')
>>> letra_bytes
b'\xc3\xb1'
>>> len(letra_bytes)
2
>>> lst = [bin(x) for x in letra_bytes]
>>> lst
['0b11000011', '0b10110001']

Podemos ver que la ñ en utf-8 se codifican como dos bytes, donde el primero comienza por [1|1|0] indicando que usa 2 bytes y el segundo comienza por [1|0]

Si extraemos la cabecera de los dos bytes y unimos para formar el codepoint tenemos:

>>> codepoint_str= lst[0].removeprefix('0b110') + lst[1].removeprefix('0b10')
>>> int(codepoint_str,2)
241
>>> chr(241)
'ñ'

Manipular la representación en string de un numero binario es ilustrativo pero no muy practico. Una funcion simple para decodificar un stream de bytes en python podria ser la siguiente:

def decode(raw: bytes) -> str:
    codepoints = list()
    longitude = len(raw)
    index = 0

    while index < longitude:
        if raw[index] < 0x80:  # check for UTF-8 1 byte encoding
            code_point = raw[index]
            index += 1
        elif (raw[index] & 0xE0) == 0xC0:  # check for UTF-8 2 byte encoding
            # starting with 0b110xxxxx
            first_byte = (raw[index] & 0x1F) << 6  # remove head and swift 6
            # places to the left to
            # make room for OR operation
            second_byte = raw[index + 1] & 0x3F
            code_point = first_byte | second_byte
            index += 2
        elif (raw[index] & 0xF0) == 0xE0:  # check for UTF-8 2 byte
            # encoding starting with 0b1110xxxx
            first_byte = (raw[index] & 0x0F) << 12
            second_byte = (raw[index + 1] & 0x3F) << 6
            third_byte = raw[index + 2] & 0x3F
            code_point = first_byte | second_byte | third_byte
            index += 3
        elif (raw[index] & 0xF8) == 0xF0 and (raw[index] <= 0xF4):  # Checks
            # if it is a 4-byte UTF-8 encoding scheme that starts with
            # 0b11110xxx. It also checks that it's within
            # the valid Unicode range, which goes up to 0x10FFFF,
            # because the maximum valid 4-byte UTF-8 sequence
            # is: 0xF4 0x8F 0xBF 0xBF
            first_byte = (raw[index] & 0x07) << 18
            second_byte = (raw[index + 1] & 0x3F) << 12
            third_byte = (raw[index + 2] & 0x3F) << 6
            fourth_byte = raw[index + 3] & 0x3F
            code_point = first_byte | second_byte | third_byte | fourth_byte
            index += 4
        else:
            code_point = 0xFFFD  # Unicode replacement character
            index += 1

        if 0xD800 <= code_point <= 0xDFFF:
            msg = f"0x{code_point:x} is an invalid UTF-8 character (surrogate pair UTF-16)"
            raise ValueError(msg)
        codepoints.append(code_point)

    return "".join(chr(c) for c in codepoints)


if __name__ == "__main__":
    raw_bytes = b"\x49\xc2\xbb\xe2\x98\xba\xf0\x9f\xa9\xb7"
    decoded: str = decode(raw_bytes)
    print(decoded)
    raise SystemExit(0)

Notar que no controla demasiados errores y si los bytes de entrada están mutilados, puedo obtener una excepción del tipo IndexError. Además cuando hago operaciones binarias uso literales en hexadecimal en lugar de binario, por ejemplo 0xF8 en lugar de 0b11111000. La idea es que sea un ejemplo

Algo mas interesante y que no usa "if", se discute aquí

Perdón si quedo un poco largo y confuso, leo más de lo que escribo por lo que me falta "musculo" para la redacción. Solo quería compartirles cosas que voy aprendiendo a medida que algo me llama la atención.

fuentes:




I've seen, and wishlisted or ignored, every* game on Steam, ama
r/gamedev

The subreddit covers various game development aspects, including programming, design, writing, art, game jams, postmortems, and marketing. It serves as a hub for game creators to discuss and share their insights, experiences, and expertise in the industry.


Members Online
I've seen, and wishlisted or ignored, every* game on Steam, ama

*english, non-vr games only. ~97k of those on steam right now. https://i.imgur.com/qq5yvj5.jpeg

Common questions:

Why? I wanted to see how many "hidden gems" there might be.

How long did it take? Ballpark 2-3 years, not much time a day on average.

Would I recommend doing this? Not for a single person on this scale, but, for genre-dedicated communities, yes.

What does my storefront look like? I leave a small selection of games from the last 3 months up to gather reviews.


Should I accept Google’s offer for the Google tag?
r/developersIndia

A wholesome community made by & for software & tech folks in India. Have a doubt? Ask it out.


Members Online
Should I accept Google’s offer for the Google tag?

Currently, working as a SDE-1 in a product based company. Should I join google SDE-1 for its tag?

Current cash comp > offered cash

Offered Tc > current Tc (because of stocks)

Pros:

  1. Google/FAANG tag

  2. Stocks 💲

Cons:

  1. Will have to move to Bengaluru (and learn Kannada :p), have wfh in my current company.

  2. Probably it will take 1.5-2 years for promotion

Sensibly, it’s not worth enough to move to blr for a similar tc and more expenses but Google tag is very appealing to me.


I've been drawing normal maps by hand and I think it really sells the lighting
r/godot

The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going!


Members Online
I've been drawing normal maps by hand and I think it really sells the lighting



“I cracked faang with only ~50 leetcode questions solved”
r/leetcode

Discuss interview prep strategies and leetcode questions


Members Online
“I cracked faang with only ~50 leetcode questions solved”

Whenever I see a comment saying this, immediately know you’re lying. There is no way you have that well of a grasp on DSA with only 50 questions solved. You either studied a ton outside of leetcode, or practiced a ton on other platforms. I’m sick of seeing people lie about this to make everyone think they’re a genius. It only makes others think they are practicing wrong or are not smart enough. Thanks for reading my rant.





Communicating "up" is like walking on egg shells for me
r/ExperiencedDevs

For experienced developers. This community should be specialized subreddit facilitating discussion amongst individuals who have gained some ground in the software engineering world. Any posts or comments that are made by inexperienced individuals (outside of the weekly Ask thread) should be reported. Anything not specifically related to development or career advice that is _specific_ to Experienced Developers belongs elsewhere. Try /r/work, /r/AskHR, /r/careerguidance, or /r/OfficePolitics.


Members Online
Communicating "up" is like walking on egg shells for me

Is communicating with "the business" or higher-ups challenging or is it just me? what is most challenging about it for you?

I think the hardest things for me are (1) I'm finding that candid feedback isn't really valued. (2) It seems like the most successful folks end up bluffing on some questions especially if the higher-up is digging into an unnecessary level of detail.




  • For anything funny related to programming and software development. members
  • A community dedicated to all things web development: both front-end and back-end. For more design-related questions, try /r/web_design. members
  • A subreddit for all questions related to programming in any language. members
  • Subreddit for posting questions and asking for general advice about your python code. members
  • The subreddit covers various game development aspects, including programming, design, writing, art, game jams, postmortems, and marketing. It serves as a hub for game creators to discuss and share their insights, experiences, and expertise in the industry. members
  • Computer Programming members
  • A wholesome community made by & for software & tech folks in India. Have a doubt? Ask it out. members
  • For experienced developers. This community should be specialized subreddit facilitating discussion amongst individuals who have gained some ground in the software engineering world. Any posts or comments that are made by inexperienced individuals (outside of the weekly Ask thread) should be reported. Anything not specifically related to development or career advice that is _specific_ to Experienced Developers belongs elsewhere. Try /r/work, /r/AskHR, /r/careerguidance, or /r/OfficePolitics. members
  • A community for discussing anything related to the React UI framework and its ecosystem. Join the Reactiflux Discord (reactiflux.com) for additional React discussion and help. members
  • The official subreddit for the Godot Engine. Meet your fellow game developers as well as engine contributors, stay up to date on Godot news, and share your projects and resources with each other. Maintained by the Godot Foundation, the non-profit taking good care of the Godot project - consider donating to https://fund.godotengine.org/ to keep us going! members
  • A place for all things related to the Rust programming language—an open-source systems language that emphasizes performance, reliability, and productivity. members
  • The official Python community for Reddit! Stay up to date with the latest news, packages, and meta information relating to the Python programming language. --- If you have questions or are new to Python use r/LearnPython members
  • Ask questions and post articles about the Go programming language and related tools, events etc. members
  • Discuss interview prep strategies and leetcode questions members
  • PowerShell is a cross-platform (Windows, Linux, and macOS) automation tool and configuration framework optimized for dealing with structured data (e.g. JSON, CSV, XML, etc.), REST APIs, and object models. PowerShell includes a command-line shell, object-oriented scripting language, and a set of tools for executing scripts/cmdlets and managing modules. members
  • A subreddit for News, Help, Resources, and Conversation regarding Unity, The Game Engine. members
  • .NET Community, if you are using C#, VB.NET, F#, or anything running with .NET... you are at the right place! members
  • Next.js is a React framework for building full-stack web applications. members
  • All about the object-oriented programming language C#. members
  • Neovim is a hyperextensible Vim-based text editor. Learn more at neovim.io. members
  • members
  • members
  • [Docker](http://www.docker.io) is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more. members
  • The goal of /r/SQL is to provide a place for interesting and informative SQL content and discussions. members
  • Continuing the legacy of Vanced members
  • Comunitatea programatorilor romani de pe Reddit members
  • This subreddit has gone Restricted and reference-only as part of a mass protest against Reddit's recent API changes, which break third-party apps and moderation tools. For immediate help and problem solving, please join us at https://discourse.practicalzfs.com with the ZFS community as well. members
  • The place for news, articles and discussion regarding WordPress. members
  • Discussions, articles and news about the C++ programming language or programming in C++. members
  • This sub is dedicated to discussion and questions about Programmable Logic Controllers (PLCs): "an industrial digital computer that has been ruggedized and adapted for the control of manufacturing processes, such as assembly lines, robotic devices, or any activity that requires high reliability, ease of programming, and process fault diagnosis." members