Introduction to Carbon: An Experimental Successor to C++

2025-03-01 14:22:54.586245

This chapter introduces Carbon, a new experimental programming language being developed at Google as a potential successor to C++. While still in its early stages, Carbon aims to address some of the perceived limitations of C++ in the modern software development landscape. This chapter will provide a crash course overview of Carbon, covering its motivations, setup, basic syntax, and core language features.

Successor: In programming, a successor language is intended to eventually replace an existing, often older, language by offering improvements while retaining compatibility or addressing similar problem domains.

Experimental Language: An experimental language is under active development and subject to change. It is not yet considered stable or production-ready and may lack features, tooling, and community support compared to established languages.

Crash Course: A crash course is a rapid and intensive introduction to a subject, designed to provide foundational knowledge quickly.

1. The Rationale Behind Carbon

The development of Carbon stems from a recognized need to evolve beyond the constraints of C++, a language that, despite its power and widespread use, has accumulated what is described as “technical debt” over its long history. Google, drawing parallels with successful language transitions like JavaScript to TypeScript, Java to Kotlin, and Objective-C to Swift, envisions Carbon as a similar evolution for C++.

Technical Debt: In software development, technical debt refers to the implied cost of rework caused by choosing an easy (but perhaps limited) solution now instead of using a better approach that would take longer. Over time, accumulated technical debt can hinder development and increase complexity.

1.1. Addressing C++‘s Challenges

While C++ remains a dominant force in performance-critical domains, its complexity and accumulated legacy present challenges:

Incremental Improvement Difficulties: Making significant, impactful improvements to C++ itself is a slow and arduous process due to its vast existing codebase and community.
Modern Developer Needs: C++ is perceived as struggling to fully meet certain modern developer expectations in areas like safety, ease of use, and rapid development cycles.

Carbon’s design philosophy prioritizes a fresh start, aiming to avoid directly inheriting the historical baggage of C and C++. Key foundational goals include:

Modern Generic System: Providing a more intuitive and powerful system for writing code that can work with various data types.
Modular Code Organization: Encouraging and facilitating better structuring of code into reusable and manageable components.
Consistent and Simple Syntax: Striving for a language syntax that is easier to read, write, and understand, reducing cognitive load for developers.

1.2. Carbon’s Design Goals

Carbon is being designed to excel in areas where C++ is traditionally strong, specifically targeting performance-critical software. The core language goals are:

Performance Critical Software Support: Maintaining or exceeding the performance characteristics of C++ is paramount.
Software and Language Evolution: Facilitating easier evolution of both software written in Carbon and the language itself over time.
Code Readability, Understandability, and Writability: Prioritizing a developer-friendly experience with code that is easy to work with.
Practical Safety and Testing Mechanisms: Incorporating features to enhance code safety and streamline testing processes.
Fast and Scalable Development: Enabling efficient and rapid software development workflows.
Modern OS Platforms Interoperability: Ensuring compatibility and seamless integration with contemporary operating systems.
Interoperability and Migration from Existing C++ Code: Crucially, Carbon is designed with bi-directional interoperability with C++. This means:
- C++ code can be used within Carbon projects.
- Carbon code can be used within existing C++ projects.
- This interoperability extends to features like inheritance and templates.

Bi-directional Interoperability: In programming languages, interoperability refers to the ability of different languages or systems to work together. Bi-directional interoperability means that code in language A can seamlessly use code written in language B, and vice versa.

1.3. Carbon vs. Rust: A Key Differentiator

When considering alternatives to C++, Rust is often mentioned. While Rust shares some goals with Carbon in terms of safety and performance, a significant distinction lies in Carbon’s emphasis on C++ interoperability. Carbon is explicitly designed to facilitate migration from and co-existence with existing C++ codebases, a feature not as directly emphasized in Rust’s design.

Rust: Rust is a modern systems programming language focused on safety, speed, and concurrency. It is often considered an alternative to C++ for performance-critical applications but has a different approach to memory management and interoperability.

2. Setting Up a Carbon Development Environment

As Carbon is still experimental, the setup process is not as straightforward as with mature languages. Currently, there isn’t a dedicated Carbon compiler or toolchain readily available for direct installation. Instead, the recommended approach involves using Google’s build tool, Bazel, along with LLVM (Low-Level Virtual Machine).

Compiler: A compiler is a program that translates source code written in a high-level programming language (like Carbon) into machine code or an intermediate language that can be executed by a computer.

Toolchain: A toolchain is a set of programming tools used to perform a complex software development task or to create a software product. It typically includes a compiler, linker, and other utilities.

Bazel: Bazel is an open-source build system developed by Google, designed for building software quickly and reliably. It supports multiple languages and platforms.

LLVM (Low-Level Virtual Machine): LLVM is a project that provides a collection of modular and reusable compiler and toolchain technologies. It’s used as the backend for many compilers and tools, including those for experimental languages.

2.1. Installation using Homebrew (macOS)

For macOS users, Homebrew simplifies the installation process for Bazel and LLVM.

Homebrew: Homebrew is a package manager for macOS (and Linux) that simplifies the installation of software from the command line.

Install Bazel: Open your terminal and run the command: brew install bazelisk
Install LLVM: Run the command: brew install llvm
Add LLVM to Path: After installation, you might need to add LLVM to your system’s PATH environment variable. The transcript provides a command for this, specific to the installation context.
Clone the Carbon Repository: Use Git to clone the official Carbon language repository from GitHub: git clone <repository_url> carbon-lang (replace <repository_url> with the actual repository URL).

Gist: In the context of GitHub, a gist is a simple way to share code snippets, notes, or configurations.

Repo (Repository): In version control systems like Git, a repository is a storage location for code and its revision history.

Clone (git clone): git clone is a Git command used to create a local copy of a remote repository on your computer.

2.2. Using Compiler Explorer (Online Option)

For a quicker and simpler way to experiment with Carbon without local setup, Compiler Explorer (compiler-explorer.com) offers an online environment. Simply select “Carbon” from the language options, and you can write and execute Carbon code directly in your browser. This is particularly useful for learning basic syntax and experimenting with code snippets.

Compiler Explorer: Compiler Explorer is an online tool that allows you to write and compile code in various languages and see the assembly output in real-time. It’s helpful for understanding how code is compiled and for quickly testing code snippets.

Syntax: Syntax refers to the set of rules that define the structure of a language. In programming, syntax dictates how code must be written to be considered valid and understood by the compiler or interpreter.

3. Basic Carbon Syntax and Concepts

This section introduces fundamental elements of Carbon syntax, drawing parallels to concepts familiar to programmers with experience in languages like C++, Java, or JavaScript.

3.1. Packages and Libraries

In Carbon, code is organized into packages. A package is a collection of libraries and serves as the standard unit for code distribution. The package name also acts as the root namespace for all names within its libraries.

Package: In programming, a package is a way to organize and group related code modules together. It helps in managing namespaces and dependencies, and promotes code reusability.

Every Carbon library must have exactly one API file. This file declares all public names (functions, classes, variables) that the library exposes. The definitions for these declarations can reside either in the API file itself or in an impl file within the same library.

API file: In Carbon, an API (Application Programming Interface) file declares the public interface of a library, specifying what functionality is available for external use.

Impl file: In Carbon, an impl (implementation) file contains the actual code that defines the functionality declared in the API file.

For example, a basic Carbon program starts with declaring a package:

package sandbox api;

3.2. The `main` Function: Program Entry Point

Like C++, Java, and many other languages, Carbon uses a main function as the entry point for program execution. The main function in Carbon has a specific structure:

fn Main() -> i32 {
  // Code to be executed
  return 0;
}

fn keyword: Used to declare a function in Carbon.
Main (Upper Camel Case): The name of the main function. Function names in Carbon typically use Upper Camel Case.
(): Parentheses indicate that the Main function takes no arguments.
-> i32: Specifies the return type of the function. i32 represents a 32-bit signed integer. The main function is expected to return an integer, conventionally 0 to indicate successful execution.

Main function: The main function is the starting point of execution in many programming languages. When a program is run, the operating system calls the main function to begin program execution.

Statically Typed Language: A statically typed language is one where the data type of a variable is known at compile time and is explicitly declared. Carbon, like C++, is statically typed.

Type Error: A type error occurs when an operation is performed on a value of an inappropriate data type. Statically typed languages detect type errors during compilation.

i32 (32-bit integer): i32 is a data type in Carbon representing a 32-bit signed integer, capable of storing whole numbers within a specific range (including negative and positive values).

3.3. Outputting Text with `Print`

The Print function (uppercase ‘P’) is used to display output to the console. It is a function call, not a language keyword.

Print("Hello, World!");

To print numbers, you typically need to use placeholders within a string:

Print("The answer is {0}", 42); // Prints: The answer is 42

3.4. Functions: Defining Reusable Code Blocks

Functions are defined using the fn keyword, followed by the function name, parameters (if any), return type, and the function body enclosed in curly braces {}.

fn OutputText() -> String {
  return "Hello, World!";
}

fn Add(x: i32, y: i32) -> i32 {
  return x + y;
}

Return Type Specification: Carbon is a statically typed language, so you must explicitly declare the return type of a function using -> Type. If a function doesn’t return a value (similar to void in C++ or Java), you simply omit the return type specification. This is referred to as a void function.

Void Function: A void function is a function that does not return any value. It performs actions but does not produce a result that is passed back to the calling code.

3.5. Data Types: Primitive Types

Carbon, being statically typed, requires you to be mindful of data types. Primitive data types are fundamental building blocks for representing different kinds of data. Carbon includes:

Primitive Data Types: Primitive data types are the most basic data types provided by a programming language. They are the fundamental building blocks for representing values.

Booleans (Bool): Represent truth values, either true or false.

Booleans: Booleans are a data type that can have one of two values: true or false. They are fundamental in representing logical conditions and making decisions in code.
Integers: Represent whole numbers. Carbon distinguishes between signed and unsigned integers. Signed integers can be positive or negative, while unsigned integers are always non-negative. Integer types are denoted by i (signed) or u (unsigned) followed by the number of bits (e.g., i8, i32, u64).

Signed Integers: Signed integers are integer data types that can represent both positive and negative whole numbers, as well as zero.

Unsigned Integers: Unsigned integers are integer data types that can only represent non-negative whole numbers (zero and positive numbers). They cannot store negative values.
Floating-Point Numbers: Represent numbers with decimal points. Carbon offers f16, f32, f64, and f128 for different precision levels. f32 and f64 are commonly used.

Floating-point types: Floating-point types are data types used to represent numbers with fractional parts (decimal numbers). They are essential for calculations involving real numbers.
Strings (String): Represent sequences of characters enclosed in double quotes ("). Multi-line strings can be created using triple double quotes (""").

String types: String types are used to represent text. They are sequences of characters and are a fundamental data type for working with textual information in programs.

3.6. Naming Conventions

Carbon follows specific naming conventions to enhance code readability and consistency.

Naming Conventions: Naming conventions are a set of rules for choosing names for variables, functions, classes, and other identifiers in code. Consistent naming conventions improve code readability and maintainability.

Upper Camel Case: Used for names that represent entities whose value does not vary dynamically during program execution. This includes:
- Functions (e.g., Main, OutputText)
- Classes (e.g., Point)
- Packages (e.g., Sandbox)
- Namespaces
Lower Snake Case: Used for variables, where the value is determined at runtime and can change. This involves lowercase letters with underscores separating words (e.g., text_variable, user_name).

3.7. Variables and Constants: Storing Data

Variables are used to store data that can change during program execution. In Carbon, variables are declared using the var keyword, followed by the variable name, type, and an optional initial value.

Variable: A variable is a named storage location in a computer’s memory that can hold a value. The value stored in a variable can be changed during program execution.

var text: String = "Hello there";
var count: i32 = 0;

Constants, on the other hand, are values that are fixed and cannot be changed after they are initialized. Constants are declared using the let keyword.

Constant: A constant is a named value that cannot be changed after it is initialized. Constants are useful for representing fixed values that should not be modified during program execution.

let greeting: String = "Welcome"; // greeting cannot be reassigned

Carbon also supports type inference using the auto keyword in some contexts, allowing the compiler to automatically deduce the data type based on the assigned value. However, explicit type declaration is generally recommended for clarity.

3.8. Variable Scope: Local and Global

Scope refers to the region of a program where a declared variable is accessible. Carbon, like many languages, has function scope and global variable scope.

Scope: In programming, scope refers to the region of a program where a declared variable or identifier is valid and can be accessed.

Function Scope: Variables declared within a function have function scope, meaning they are only accessible within that function.

Global Variable: A global variable is declared outside of any function and has global scope, meaning it can be accessed from any part of the program.

Variables declared inside a function are local to that function and cannot be accessed from outside. Variables declared outside of any function are considered global variables and can be accessed from anywhere in the program.

3.9. Strings: Working with Text

Carbon uses the String type for text manipulation. As mentioned earlier, strings are enclosed in double quotes. Multi-line strings are supported using triple double quotes, allowing you to span text across multiple lines.

Multi-line strings: Multi-line strings allow you to define string literals that span across multiple lines in the source code, preserving formatting and line breaks within the string.

3.10. Function Arguments (Parameters): Passing Data to Functions

Functions can accept input values through function arguments, also known as parameters. When defining a function, you specify the names and types of the parameters it expects.

Function arguments/parameters: Function arguments, also known as parameters, are values passed to a function when it is called. They allow you to provide input data to the function for it to operate on.

fn Greet(name: String) {
  Print("Hello, {0}!", name);
}

Greet("Alice"); // Calling the function with the argument "Alice"

3.11. Arrays: Ordered Collections of Elements

Arrays in Carbon are fixed-size, ordered collections of elements of the same data type. When declaring an array, you must specify both the data type of the elements and the size of the array.

Arrays: Arrays are data structures that store a fixed-size sequence of elements of the same data type. Elements in an array are accessed using their index, starting from zero.

var intArray: [i32; 5] = (10, 20, 30, 40, 50); // Array of 5 integers
var stringArray: [String; 2] = ("Hello", "Carbon"); // Array of 2 strings

Array elements are accessed using zero-based indexing within square brackets [].

Print("{0}", intArray[1]); // Accessing the element at index 1 (value 20)

3.12. Tuples: Heterogeneous Collections

Tuples are another way to group multiple values together. Unlike arrays, tuples can hold elements of different data types. Tuple types are defined by listing the types of elements within parentheses.

Tuples: Tuples are data structures that can hold a fixed-size sequence of elements, where each element can potentially have a different data type. Tuples are often used to group related items together.

var tuple1: auto = (1, 2, true, "hello"); // Tuple with type inference
var tuple2: (String, String, Bool, i32) = ("hello", "world", true, 123); // Explicitly typed tuple

Tuple elements are accessed using zero-based indexing, similar to arrays.

Print("{0}", tuple1[3]); // Accessing the element at index 3 (value "hello")

Tuples can also be converted to arrays if needed.

3.13. Structs: Custom Data Structures

Structs (structures) are user-defined data types that allow you to group together variables of different types under a single name. Structs are similar to objects in JavaScript or dictionaries in Python, representing key-value pairs.

Structs: Structs (structures) are user-defined data types that allow you to group together variables of different data types under a single name. They are used to create custom data structures that represent entities with multiple attributes.

Key-value pairs: Key-value pairs are a fundamental data structure where each value is associated with a unique key. Structs and objects often use key-value pairs to store and access data.

var person: auto = .{
  .name = "Brad",
  .loc = "Unknown"
};

Print("{0}", person.name); // Accessing the 'name' field of the struct
person.name = "John"; // Modifying a struct field

Structs can also be defined with explicit types for their fields:

var point: struct { x: i32, y: i32 } = .{
  .x = 1,
  .y = 2
};

3.14. Classes and Objects: Object-Oriented Programming

Classes in Carbon are blueprints for creating objects. They encapsulate data (properties or attributes) and behavior (methods or functions). Object-oriented programming (OOP) principles are supported in Carbon through classes.

Classes: In object-oriented programming, a class is a blueprint or template for creating objects. It defines the properties (data) and methods (behavior) that objects of that class will have.

Objects: An object is an instance of a class. It is a concrete entity that has the properties and behaviors defined by its class.

Properties/Attributes: Properties (or attributes) are data members of a class or object. They store information about the state of an object.

Methods: Methods are functions that are associated with a class or object. They define the behaviors or actions that objects of that class can perform.

Instantiate: To instantiate a class means to create an object of that class in memory. An object is an instance of a class.

Object-oriented programming (OOP): Object-oriented programming is a programming paradigm that organizes code around “objects,” which are instances of classes. OOP emphasizes concepts like encapsulation, inheritance, and polymorphism.

class Point {
  var x: i32;
  var y: i32;

  fn GetX(me: Point) -> i32 { // Method to get the x-coordinate
    return me.x;
  }

  fn GetY(me: Point) -> i32 { // Method to get the y-coordinate
    return me.y;
  }

  fn Origin() -> Point { // Function (static-like) to create a Point at origin
    return .{ .x = 0, .y = 0 };
  }
}

var p: Point = .{ .x = 100, .y = 200 }; // Instantiating a Point object
Print("{0}", p.GetY()); // Calling a method on the object
var originPoint: Point = Point.Origin(); // Calling a function on the class itself

class keyword: Used to define a class. Class names typically start with an uppercase letter (Upper Camel Case).
Properties within a Class: Declared using var inside the class definition.
Methods within a Class: Functions defined within a class are called methods. The first parameter of a method is conventionally named me and is of the class type itself. This acts similarly to the this keyword in other languages, allowing methods to access the object’s properties.
Functions within a Class (Static-like): Functions defined within a class without the me parameter are associated with the class itself rather than instances of the class. They are called using the class name (e.g., Point.Origin()).

3.15. Pointers: Working with Memory Addresses

Pointers are variables that hold the memory address of another variable. Carbon, like C++, supports pointers, providing lower-level memory manipulation capabilities. The ampersand operator (&) is used to get the address of a variable.

Pointers: Pointers are variables that store memory addresses. They “point to” locations in memory where data is stored, allowing for direct memory manipulation.

Address (memory address): A memory address is a unique identifier that specifies a location in a computer’s memory where data is stored.

Ampersand operator (&): In languages like C and C++, the ampersand operator (&) is used to obtain the memory address of a variable.

var x: i32 = 5;
var p: i32* = &x; // p is a pointer to an i32, pointing to the address of x

Print("{0}", x); // Prints 5
*p = 7;          // Dereferencing the pointer p and changing the value at that address
Print("{0}", x); // Prints 7 (x has been modified through the pointer)

Pointer Type: A pointer type is denoted by an asterisk (*) after the data type it points to (e.g., i32* is a pointer to an i32).
Dereferencing Operator (*): The asterisk (*) when used before a pointer variable name is the dereferencing operator. It accesses the value stored at the memory address pointed to by the pointer.

Pointers are often used in systems programming and scenarios where direct memory control and efficiency are critical. Memory allocation for variables typically happens on the heap or stack.

Heap (memory heap): The heap is a region of computer memory used for dynamic memory allocation. Memory allocated on the heap must be explicitly managed (allocated and deallocated) by the programmer in languages like C++.

3.16. Conditionals: `if-else` Statements

Conditionals, specifically if-else statements, allow you to execute different blocks of code based on whether a condition is true or false. The basic syntax is similar to C++, Java, and JavaScript.

Conditionals: Conditionals are programming constructs that allow you to control the flow of execution based on whether a certain condition is true or false.

If-else statements: If-else statements are a type of conditional statement that allows you to execute one block of code if a condition is true, and another block of code (optional) if the condition is false.

fn GuessNumber(num: i32) {
  if (num == 7) {
    Print("You guessed correctly!");
  } else {
    Print("Please try again.");
  }
}

GuessNumber(3); // Output: Please try again.
GuessNumber(7); // Output: You guessed correctly!

Important Note: At the time of the transcript, Carbon’s support for comparison operators (like >, <, >=, <=, !=) was limited in the online compiler explorer. The example uses only the equality operator (==).

3.17. Match Statements: Multi-way Branching

The match statement provides a way to perform multi-way branching based on the value of an expression. It is similar to a switch statement in other languages.

Match (switch statement equivalent): A match statement (or switch statement in other languages) is a control flow construct that allows you to execute different blocks of code based on the value of an expression. It provides a more structured alternative to nested if-else statements for handling multiple cases.

fn LuckyNumbers(number: i32) -> i32 {
  match (number) {
    7 => { Print("7 is a lucky number"); return number; }
    11 => { Print("11 is a lucky number"); return number; }
    12 => { Print("12 is a lucky number"); return number; }
    default => { Print("{0} is not a lucky number", number); return number; }
  }
}

LuckyNumbers(7);  // Output: 7 is a lucky number
LuckyNumbers(10); // Output: 10 is not a lucky number

match (expression): The expression to be evaluated is placed within parentheses after the match keyword.
case value => { ... }: Each case specifies a value to match against. If the expression’s value matches a case, the code block within the curly braces {} for that case is executed. The => is a “fat arrow” symbol.
default => { ... }: The default case is executed if none of the preceding cases match the expression’s value. It is similar to the default case in a switch statement.

3.18. Loops: `while` Loops for Iteration

Iteration, or looping, is a fundamental programming concept for repeating a block of code multiple times. Carbon supports while loops.

Iteration: Iteration, also known as looping, is a programming construct that allows you to repeatedly execute a block of code until a certain condition is met.

While loop: A while loop repeatedly executes a block of code as long as a specified condition remains true. The condition is checked at the beginning of each iteration.

var i: i32 = 11;
while (!(i == 0)) { // While i is NOT equal to 0
  Print("Number {0}", i);
  i = i - 1; // Decrement i
}

while (condition) { ... }: The while loop executes the code block within the curly braces {} as long as the condition inside the parentheses is true. The condition is checked before each iteration.
Decrement: In the example, i = i - 1; is a decrement operation, reducing the value of i by 1 in each iteration, ensuring the loop eventually terminates when i reaches 0.

Decrement: Decrementing is the process of decreasing the value of a variable, typically by one.

Note: At the time of the transcript, there was no demonstrated example of a for loop in Carbon. The while loop is presented as the primary iteration construct.

3.19. Generics: Parameterized Code for Type Flexibility

Generics provide a mechanism for writing parameterized code that can work with different data types without requiring separate code implementations for each type. Generics enhance code reusability and flexibility.

Generics: Generics are a programming feature that allows you to write code that can work with different data types without knowing the specific types at compile time. They provide type abstraction and code reusability.

Parameterized code: Parameterized code is code that is written to work with different types or values, often using placeholders or parameters to represent the varying parts. Generics are a form of parameterized code focused on types.

fn GetVal[T: type](x: T) -> T { // Generic function GetVal
  return x;
}

Print("{0}", GetVal[i32](100));    // Calling GetVal with an integer type
Print("{0}", GetVal[String]("Hello")); // Calling GetVal with a string type

Generic Function Definition: Generic functions are defined using square brackets [] after the function name to introduce type parameters. [T: type] declares a type parameter named T, which can represent any type that satisfies the type constraint.
Type Parameter Usage: Inside the function signature and body, the type parameter T can be used as a placeholder for types. In GetVal[T: type](x: T) -> T, T is used for both the parameter type (x: T) and the return type (-> T).
Calling Generic Functions: When calling a generic function, you specify the concrete type to be used within square brackets after the function name (e.g., GetVal[i32], GetVal[String]).

Generics are a powerful feature for creating reusable and type-safe code, especially in statically typed languages. They are similar in concept to templates in C++.

Template: In C++, templates are a feature that allows you to write generic code that can work with different data types without specifying the exact types at compile time. Carbon’s generics serve a similar purpose.

4. Conclusion and Future of Carbon

Carbon is presented as an exciting, albeit experimental, endeavor to create a successor language for C++. Its focus on C++ interoperability, modern language design principles, and goals of improved safety and developer experience make it a language to watch. However, it is crucial to remember that Carbon is currently in its early stages of development. It is not yet ready for production use, and its future direction will depend on community adoption and further development efforts. For those interested in language design and the evolution of programming paradigms, Carbon offers a fascinating case study and a potential glimpse into the future of systems programming.