Module 3 of 26

Units, notation, and binary basics

Bits, bytes, binary counting, hexadecimal, endianness, and why unit precision matters in every data system from storage allocation to network protocols.

By the end of this module you will be able to:

  • Convert between bits, bytes, and common storage prefixes correctly
  • Distinguish IEC binary prefixes (KiB, MiB, GiB) from SI decimal prefixes (KB, MB, GB)
  • Read a 4-bit binary number and express a byte value in hexadecimal
  • Explain why unit ambiguity causes real data engineering problems

Same digits at four different scales, four different quantities

The same digits 12 mean four different quantities depending on whether the scale is percent, probability, decimal byte, or hex.

Same digits, four scales, four different quantities Two parallel vertical stacks. Left stack shows the raw value labelled by unit: 12 percent, 0.12 probability, 12 bytes, 0x12 hex. Right stack shows what each evaluates to: 12 per 100, 12 in every 100, 96 bits stored, 18 in decimal. Brand-red arrows labelled resolves to bridge each left row to its right row. The byte row is emphasised in red soft. A red-accent callout names the byte vs hex confusion as the most common data-import defect. SAME DIGITS · DIFFERENT SCALES · DIFFERENT QUANTITIES AS PERCENT 12 % AS PROBABILITY 0.12 AS DECIMAL BYTE COUNT 12 bytes AS HEX VALUE 0x12 EVALUATES TO ISO 80000-13 12 per 100 EVALUATES TO ISO 80000-13 12 in every 100 EVALUATES TO IEC 60027-2 96 bits stored EVALUATES TO ISO 80000-13 18 in decimal resolves to resolves to resolves to resolves to A naked number is an unfinished claim Always carry the unit and scale next to the value. The byte vs hex confusion is the most common data-import defect. ransfordsnotes.com

The digits 12 alone do not commit to a meaning. As a percent it is twelve per hundred, as a probability twelve hundredths, as a decimal byte it is twelve bytes, as hex it is the value eighteen. ISO/IEC 80000-13 and IEC 60027-2 fix these scales formally.

Why decimal kB and binary KiB are not the same

Decimal kB and binary KiB are two parallel storage scales. The gap grows with every prefix.

Decimal kB and binary KiB are two parallel storage scales Two parallel lanes. Top lane DECIMAL SI: Bit, Byte, Kilobyte (1 kB = 1000 bytes), with arrows between each step. Bottom lane BINARY IEC: Bit, Byte, Kibibyte (1 KiB = 1024 bytes; emphasised in red soft) with the same shape. A red-accent callout names how the gap grows with every prefix and why vendor advertised capacity looks short on the operating system. STORAGE SCALE · DECIMAL kB vs BINARY KiB · IEC 80000-13:2008 DECIMAL · SI PREFIX · kB Bit 0 or 1 Byte 8 bits = 1 byte Kilobyte 1 kB = 1000 bytes BINARY · IEC PREFIX · KiB Bit 0 or 1 Byte 8 bits = 1 byte Kibibyte 1 KiB = 1024 bytes The gap grows with every prefix kB vs KiB: 2.4 percent. MB vs MiB: 4.9 percent. GB vs GiB: 7.4 percent. TB vs TiB: 10 percent. Storage vendors advertise in SI; operating systems report in IEC. Capacity claims that look short are usually correct. ransfordsnotes.com

Kilobyte (kB) is 1000 bytes; kibibyte (KiB) is 1024. The 2.4 percent gap grows by base at every prefix: by gigabyte the gap is 7.4 percent, by terabyte 10 percent. IEC 80000-13:2008 fixed the SI vs binary naming so vendors and learners can disambiguate.

Percent or probability: the triage that prevents category mistakes

Choosing between percent and probability starts with a triage question: are you stating a share or a chance?

Choosing between percent and probability is a triage decision Decision tree. Top red-soft triage card asks: are you stating a share or a chance? Two branch cards below: PERCENT PATH (ratio on 100 scale; example; rule on naming the population) and PROBABILITY PATH (likelihood 0 to 1; example; rule on naming event and base). Brand-red arrows from triage to each branch. A red-soft callout names the category mistake: treating 12 percent risk as probability 0.12 without a base rate. PERCENT vs PROBABILITY · DECISION TREE · NIST §1.3.5 TRIAGE QUESTION · ASK FIRST Are you stating a share or a chance? Share -> percent. Chance with a base -> probability. PERCENT PATH Ratio on 100 scale EXAMPLE 12 percent of files exceed the size limit. RULE Name the population. Without it, percent is meaningless. PROBABILITY PATH Likelihood 0 to 1 EXAMPLE Probability of fraud given the signal is 0.12. RULE Name the event and the conditioning context. Without both, probability is undefined. The confusion that lands in the press "12 percent risk" reported as a probability of 0.12 with no base rate is a category mistake. Always state the event and the base. Probability is only meaningful with both. ransfordsnotes.com

Percent is a ratio on a 100 scale; probability is likelihood on a 0 to 1 scale. They look interchangeable and behave differently. The triage question is what event and what base. NIST Engineering Statistics Handbook §1.3.5 separates the two formally.

Bits, bytes, and storage prefixes

A bit is the smallest unit of digital information, representing one of two possible states: 0 or 1. A byte is a group of 8 bits. It is the standard addressable unit of memory in most computer architectures. A single byte can represent 256 distinct values (2^8 = 256), sufficient to encode one character in legacy character sets such as ASCII.

Two incompatible prefix systems are in use, and they conflict:

  • SI (decimal) prefixes: 1 kilobyte (KB) = 1,000 bytes. Hard drive manufacturers use this system.
  • IEC (binary) prefixes, standardised in IEC 80000-13:2008: 1 kibibyte (KiB) = 1,024 bytes (2^10). Operating systems historically used this system while labelling units as "KB," which caused the confusion.

The gap widens at larger scales. A "1 TB" hard drive contains 1,000,000,000,000 bytes (SI). Windows, which uses binary calculations, reports this as approximately 931 GiB but displays "GiB" as "GB," making the drive appear smaller than advertised. A data pipeline that expects MB (SI) but receives MiB (IEC) values will underestimate storage requirements by approximately 4.8% per step. Across petabyte-scale operations, this becomes significant.

With an understanding of bits, bytes, and storage prefixes in place, the discussion can now turn to binary, hexadecimal, and endianness, which builds directly on these foundations.

Binary, hexadecimal, and endianness

Computers use base-2 (binary) arithmetic because electronic circuits reliably represent two states. In binary, digit positions represent powers of 2. Binary 0101 = 0 + 4 + 0 + 1 = 5 in decimal. Binary 1011 = 8 + 0 + 2 + 1 = 11. A 4-bit number can represent values from 0000 (0) to 1111 (15), giving 16 possible values (2^4). An 8-bit byte gives 256 values.

Hexadecimal (base-16) uses digits 0-9 and letters A-F. Each hex digit represents exactly 4 bits; two hex digits represent one byte. The web colour #FF5733 encodes RGB (Red 255, Green 87, Blue 51). MAC addresses appear as six hex pairs such as A4:C3:F0:85:AC:2D. IPv6 addresses use eight groups of four hex digits. Memory addresses in debuggers are expressed in hex.

Endianness is the order in which bytes are stored for multi-byte values. Big-endian stores the most significant byte first (used by network protocols, also called "network byte order"). Little-endian stores the least significant byte first (used by x86 processors: Intel, AMD). The 32-bit integer 0x12345678 is stored as "12 34 56 78" in big-endian and "78 56 34 12" in little-endian. Mismatches between systems exchanging binary data without agreeing on byte order cause silent data corruption.

The names and symbols for binary multiples shall be formed by attaching the appropriate prefix symbol to the symbol 'B' for byte. The binary prefixes are: kibi (Ki), mebi (Mi), gibi (Gi), tebi (Ti). These are distinct from the SI prefixes kilo, mega, giga, tera.

IEC 80000-13:2008, Quantities and units for information science and technology

Common misconception

All systems agree on what 1 KB or 1 GB means, so I don't need to specify the unit system.

In practice, network equipment, storage hardware, operating systems, and programming languages all have different defaults. Hard drive manufacturers use SI (1 GB = 1,000,000,000 bytes); operating systems historically used IEC binary (1 GB = 1,073,741,824 bytes) while calling it 'GB'. The difference at 1 GB is approximately 7.4%. Never assume which prefix system a system is using. Always document the unit system explicitly in data schemas and pipeline specifications. The NASA Mars Climate Orbiter was destroyed because one team assumed SI and another assumed imperial; the same category of assumption error affects data pipelines every day.

Check your understanding

A data pipeline receives a file with the header 'size: 512' and the sending system uses IEC binary units (KiB); the receiving system allocates buffer space using SI decimal units (KB). Which statement is correct?

A web designer specifies the background colour as #1A2B3C. What is the decimal value of the red channel?

Key takeaways

  • A bit is a single binary digit (0 or 1); a byte is 8 bits, capable of 256 distinct values. Data is ultimately stored and transmitted as sequences of bits.
  • SI prefixes (KB, MB, GB) use powers of 10; IEC prefixes (KiB, MiB, GiB) use powers of 2. The gap grows significantly at larger scales: 1 TB (SI) vs 0.931 TiB (IEC). Always specify which system your pipeline uses.
  • Hexadecimal compresses binary into a readable form: each hex digit represents exactly 4 bits. Hex appears in colour codes, MAC addresses, memory addresses, and IPv6.
  • Endianness determines byte order in multi-byte values. Big-endian is used by network protocols; little-endian by x86 processors. Mismatches cause silent data corruption.
  • Unit ambiguity is an engineering risk at every scale. The NASA Mars Climate Orbiter loss ($327.6 million, 1999) is the canonical example of what unit mismatches cost when not caught.

Now that you can work with bits, bytes, and unit systems, the next module builds on that foundation by examining how data is represented in practical formats: character encodings, floating-point numbers, JSON, CSV, and compression. Format choices made at ingestion time follow data through its entire lifecycle.

Standards and sources cited in this module

  1. IEC 80000-13:2008: Quantities and units for information science

    Authoritative definition of KiB, MiB, GiB binary prefixes and their distinction from SI prefixes.

  2. BIPM SI Brochure 9th edition (2019): Prefixes

    SI prefix definitions: KB, MB, GB as powers of 10.

  3. NASA Mars Climate Orbiter Mishap Investigation Board Report, November 1999

    Root cause analysis: metric/imperial unit mismatch in thruster force data caused spacecraft destruction.