Lesson 1: Setting Up Your Development Environment

Learning Objectives

By the end of this lesson, you will:

  • Have a complete Python development environment setup
  • Understand the project structure for building embeddings
  • Run your first "Hello Embeddings" program
  • Know how to execute code using different methods

Prerequisites

  • Basic command line knowledge
  • A computer with macOS, Windows, or Linux
  • Internet connection for downloading tools

1. Installing Python with uv

What is uv?

uv is a modern Python package manager that's fast and handles virtual environments automatically. It's perfect for learning projects.

Installation Steps

macOS/Linux:

1curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

1powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Verify Installation:

1uv --version

You should see something like: uv 0.5.x

2. Setting Up VS Code

Download and Install

  1. Go to code.visualstudio.com
  2. Download for your operating system
  3. Install following the default options

Essential Extensions

Open VS Code and install these extensions (Cmd/Ctrl+Shift+X):

  1. Python - Official Python extension by Microsoft
  2. Python Debugger - For debugging Python code
  3. Jupyter - For running notebook-style code

3. Creating Your Project

Step 1: Create Project Directory

1mkdir learn-embeddings
2cd learn-embeddings

Step 2: Initialize with uv

1uv init

This creates:

  • pyproject.toml - Project configuration
  • .python-version - Python version specification
  • hello.py - Sample file (we'll replace this)

Step 3: Install Dependencies

1uv add numpy matplotlib

4. Your First Embedding Code

Create a file called lesson1_hello.py:

1"""
2Lesson 1: Hello Embeddings
3Understanding what embeddings are through a simple example
4"""
5
6def main():
7 # What is an embedding?
8 # An embedding is a way to represent words as numbers
9
10 # Let's start with a tiny vocabulary
11 words = ["hello", "world", "embedding"]
12
13 # Simple approach: assign each word a unique number
14 word_to_id = {}
15 for i, word in enumerate(words):
16 word_to_id[word] = i
17
18 print("Our vocabulary:")
19 print(word_to_id)
20 print(f"\nThe word 'hello' is represented as: {word_to_id['hello']}")
21 print(f"The word 'world' is represented as: {word_to_id['world']}")
22 print(f"The word 'embedding' is represented as: {word_to_id['embedding']}")
23
24 # This is the simplest form of embedding!
25 # Each word gets a unique integer ID
26
27if __name__ == "__main__":
28 main()

5. Running Your Code

Method 1: Direct Execution with uv

1uv run python lesson1_hello.py

Method 2: VS Code Run Button

  1. Open lesson1_hello.py in VS Code
  2. Click the "Run Python File" button (ā–¶ļø) in the top right
  3. Or press Ctrl+F5 (Windows/Linux) or Cmd+F5 (Mac)

Method 3: Interactive Python (REPL)

1uv run python

Then type:

1exec(open('lesson1_hello.py').read())

Method 4: Jupyter-style in VS Code

  1. Create a new file lesson1_notebook.py
  2. Add this comment at the top: # %%
  3. Write your code in cells:
1# %%
2# Cell 1: Define our vocabulary
3words = ["hello", "world", "embedding"]
4print(f"Vocabulary size: {len(words)}")
5
6# %%
7# Cell 2: Create word-to-ID mapping
8word_to_id = {word: i for i, word in enumerate(words)}
9print(word_to_id)
10
11# %%
12# Cell 3: Test our mapping
13test_word = "hello"
14print(f"'{test_word}' maps to ID: {word_to_id[test_word]}")

Click "Run Cell" above each # %% marker to execute cells individually.

6. Understanding the Output

When you run the code, you'll see:

Our vocabulary:
{'hello': 0, 'world': 1, 'embedding': 2}

The word 'hello' is represented as: 0
The word 'world' is represented as: 1
The word 'embedding' is represented as: 2
šŸ”‘

Key Concept

Embeddings convert words to numbers - This is the fundamental concept behind all NLP. Computers can't understand text directly, so we must represent words as numbers. Each word gets a unique representation, and this forms the foundation for more advanced embeddings like Word2Vec.

7. Project Structure Best Practices

Your project should look like:

1learn-embeddings/
2ā”œā”€ā”€ pyproject.toml
3ā”œā”€ā”€ .python-version
4ā”œā”€ā”€ books/
5│ ā”œā”€ā”€ chapter-1/
6│ │ ā”œā”€ā”€ lesson-1.md
7│ │ └── lesson-2.md
8│ └── README.md
9ā”œā”€ā”€ code/
10│ ā”œā”€ā”€ lesson1_hello.py
11│ └── lesson1_notebook.py
12└── README.md

Project structure explanation:

  • pyproject.toml - Project configuration
  • .python-version - Python version
  • books/ - Learning materials
  • code/ - Your code implementations
šŸ’”
Development Tip

Always organize your learning projects with clear structure. Separate your learning materials (books/) from your code implementations (code/). This makes it easier to reference lessons while coding and keeps everything organized as you progress.

Practice Exercises

Exercise 1: Extend the Vocabulary

Add 5 more words to the vocabulary and print their IDs.

Your task: Modify the original code to include these words: "python", "machine", "learning", "neural", "network"

✨ Click to reveal solution

Solution:

1# Extended vocabulary with 5 more words
2words = ["hello", "world", "embedding", "python", "machine", "learning", "neural", "network"]
3
4word_to_id = {}
5for i, word in enumerate(words):
6 word_to_id[word] = i
7
8print("Extended vocabulary:")
9for word, id in word_to_id.items():
10 print(f"'{word}': {id}")

Output:

Extended vocabulary:
'hello': 0
'world': 1
'embedding': 2
'python': 3
'machine': 4
'learning': 5
'neural': 6
'network': 7

Exercise 2: Reverse Mapping

Create an id_to_word dictionary that maps IDs back to words.

Your task: Given the word_to_id dictionary, create the reverse mapping so you can look up words by their ID numbers.

✨ Click to reveal solution

Solution:

1words = ["hello", "world", "embedding"]
2
3# Create word-to-ID mapping
4word_to_id = {word: i for i, word in enumerate(words)}
5
6# Create reverse mapping (ID-to-word)
7id_to_word = {i: word for word, i in word_to_id.items()}
8
9print("Word to ID:", word_to_id)
10print("ID to Word:", id_to_word)
11
12# Test the reverse mapping
13test_id = 1
14print(f"ID {test_id} maps to word: '{id_to_word[test_id]}'")

Output:

Word to ID: {'hello': 0, 'world': 1, 'embedding': 2}
ID to Word: {0: 'hello', 1: 'world', 2: 'embedding'}
ID 1 maps to word: 'world'

Alternative approach using list indexing:

1# Since our IDs are sequential starting from 0,
2# we can use the original words list as reverse mapping
3words = ["hello", "world", "embedding"]
4word_to_id = {word: i for i, word in enumerate(words)}
5
6# words[id] gives us the word for that ID
7test_id = 2
8print(f"ID {test_id} maps to word: '{words[test_id]}'")

Exercise 3: Handle Unknown Words

What happens if you try to look up a word not in the vocabulary? Add error handling.

Your task: Try to look up a word like "python" that's not in your vocabulary. Then add proper error handling to gracefully handle unknown words.

✨ Click to reveal solution

Solution:

1words = ["hello", "world", "embedding"]
2word_to_id = {word: i for i, word in enumerate(words)}
3
4def get_word_id(word, word_to_id_dict):
5 """
6 Get word ID with proper error handling
7 """
8 if word in word_to_id_dict:
9 return word_to_id_dict[word]
10 else:
11 print(f"Warning: Word '{word}' not found in vocabulary!")
12 return None
13
14# Test with known words
15print("Known word:", get_word_id("hello", word_to_id))
16print("Known word:", get_word_id("world", word_to_id))
17
18# Test with unknown word
19print("Unknown word:", get_word_id("python", word_to_id))
20
21# Alternative: Using try-except
22def get_word_id_v2(word, word_to_id_dict):
23 try:
24 return word_to_id_dict[word]
25 except KeyError:
26 print(f"Error: '{word}' is not in vocabulary")
27 return -1 # Return -1 for unknown words
28
29print("Unknown word (v2):", get_word_id_v2("machine", word_to_id))
30
31# Alternative: Using dict.get() with default
32def get_word_id_v3(word, word_to_id_dict, unknown_id=-1):
33 return word_to_id_dict.get(word, unknown_id)
34
35print("Unknown word (v3):", get_word_id_v3("learning", word_to_id))

Output:

Known word: 0
Known word: 1
Warning: Word 'python' not found in vocabulary!
Unknown word: None
Error: 'machine' is not in vocabulary
Unknown word (v2): -1
Unknown word (v3): -1

Key Takeaways

āœ… Environment Setup: uv manages Python and packages simply
āœ… VS Code: Provides multiple ways to run Python code
āœ… First Embedding: Words can be represented as unique numbers
āœ… Foundation: This integer ID approach is the basis for all embeddings

Next Lesson Preview

In Lesson 2, we'll improve our embeddings by representing words as vectors instead of single numbers. This allows us to capture relationships between words!


Navigation