Lesson 1: Setting Up Your Development Environment
Learning Objectives
By the end of this lesson, you will:
- Have a complete Python development environment setup
- Understand the project structure for building embeddings
- Run your first "Hello Embeddings" program
- Know how to execute code using different methods
Prerequisites
- Basic command line knowledge
- A computer with macOS, Windows, or Linux
- Internet connection for downloading tools
1. Installing Python with uv
What is uv?
uv
is a modern Python package manager that's fast and handles virtual environments automatically. It's perfect for learning projects.
Installation Steps
macOS/Linux:
1curl -LsSf https://astral.sh/uv/install.sh | sh
Windows:
1powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Verify Installation:
1uv --version
You should see something like: uv 0.5.x
2. Setting Up VS Code
Download and Install
- Go to code.visualstudio.com
- Download for your operating system
- Install following the default options
Essential Extensions
Open VS Code and install these extensions (Cmd/Ctrl+Shift+X):
- Python - Official Python extension by Microsoft
- Python Debugger - For debugging Python code
- Jupyter - For running notebook-style code
3. Creating Your Project
Step 1: Create Project Directory
1mkdir learn-embeddings2cd learn-embeddings
Step 2: Initialize with uv
1uv init
This creates:
pyproject.toml
- Project configuration.python-version
- Python version specificationhello.py
- Sample file (we'll replace this)
Step 3: Install Dependencies
1uv add numpy matplotlib
4. Your First Embedding Code
Create a file called lesson1_hello.py
:
1"""2Lesson 1: Hello Embeddings3Understanding what embeddings are through a simple example4"""5
6def main():7 # What is an embedding?8 # An embedding is a way to represent words as numbers9 10 # Let's start with a tiny vocabulary11 words = ["hello", "world", "embedding"]12 13 # Simple approach: assign each word a unique number14 word_to_id = {}15 for i, word in enumerate(words):16 word_to_id[word] = i17 18 print("Our vocabulary:")19 print(word_to_id)20 print(f"\nThe word 'hello' is represented as: {word_to_id['hello']}")21 print(f"The word 'world' is represented as: {word_to_id['world']}")22 print(f"The word 'embedding' is represented as: {word_to_id['embedding']}")23 24 # This is the simplest form of embedding!25 # Each word gets a unique integer ID26
27if __name__ == "__main__":28 main()
5. Running Your Code
Method 1: Direct Execution with uv
1uv run python lesson1_hello.py
Method 2: VS Code Run Button
- Open
lesson1_hello.py
in VS Code - Click the "Run Python File" button (ā¶ļø) in the top right
- Or press
Ctrl+F5
(Windows/Linux) orCmd+F5
(Mac)
Method 3: Interactive Python (REPL)
1uv run python
Then type:
1exec(open('lesson1_hello.py').read())
Method 4: Jupyter-style in VS Code
- Create a new file
lesson1_notebook.py
- Add this comment at the top:
# %%
- Write your code in cells:
1# %%2# Cell 1: Define our vocabulary3words = ["hello", "world", "embedding"]4print(f"Vocabulary size: {len(words)}")5
6# %%7# Cell 2: Create word-to-ID mapping8word_to_id = {word: i for i, word in enumerate(words)}9print(word_to_id)10
11# %%12# Cell 3: Test our mapping13test_word = "hello"14print(f"'{test_word}' maps to ID: {word_to_id[test_word]}")
Click "Run Cell" above each # %%
marker to execute cells individually.
6. Understanding the Output
When you run the code, you'll see:
Our vocabulary:
{'hello': 0, 'world': 1, 'embedding': 2}
The word 'hello' is represented as: 0
The word 'world' is represented as: 1
The word 'embedding' is represented as: 2
Key Concept
Embeddings convert words to numbers - This is the fundamental concept behind all NLP. Computers can't understand text directly, so we must represent words as numbers. Each word gets a unique representation, and this forms the foundation for more advanced embeddings like Word2Vec.
7. Project Structure Best Practices
Your project should look like:
1learn-embeddings/2āāā pyproject.toml 3āāā .python-version 4āāā books/ 5ā āāā chapter-1/6ā ā āāā lesson-1.md 7ā ā āāā lesson-2.md 8ā āāā README.md 9āāā code/ 10ā āāā lesson1_hello.py11ā āāā lesson1_notebook.py12āāā README.md
Project structure explanation:
pyproject.toml
- Project configuration.python-version
- Python versionbooks/
- Learning materialscode/
- Your code implementations
Development Tip
Always organize your learning projects with clear structure. Separate your learning materials (books/
) from your code implementations (code/
). This makes it easier to reference lessons while coding and keeps everything organized as you progress.
Practice Exercises
Exercise 1: Extend the Vocabulary
Add 5 more words to the vocabulary and print their IDs.
Your task: Modify the original code to include these words: "python", "machine", "learning", "neural", "network"
⨠Click to reveal solution
Solution:
1# Extended vocabulary with 5 more words2words = ["hello", "world", "embedding", "python", "machine", "learning", "neural", "network"]3
4word_to_id = {}5for i, word in enumerate(words):6 word_to_id[word] = i7
8print("Extended vocabulary:")9for word, id in word_to_id.items():10 print(f"'{word}': {id}")
Output:
Extended vocabulary:
'hello': 0
'world': 1
'embedding': 2
'python': 3
'machine': 4
'learning': 5
'neural': 6
'network': 7
Exercise 2: Reverse Mapping
Create an id_to_word
dictionary that maps IDs back to words.
Your task: Given the word_to_id dictionary, create the reverse mapping so you can look up words by their ID numbers.
⨠Click to reveal solution
Solution:
1words = ["hello", "world", "embedding"]2
3# Create word-to-ID mapping4word_to_id = {word: i for i, word in enumerate(words)}5
6# Create reverse mapping (ID-to-word)7id_to_word = {i: word for word, i in word_to_id.items()}8
9print("Word to ID:", word_to_id)10print("ID to Word:", id_to_word)11
12# Test the reverse mapping13test_id = 114print(f"ID {test_id} maps to word: '{id_to_word[test_id]}'")
Output:
Word to ID: {'hello': 0, 'world': 1, 'embedding': 2}
ID to Word: {0: 'hello', 1: 'world', 2: 'embedding'}
ID 1 maps to word: 'world'
Alternative approach using list indexing:
1# Since our IDs are sequential starting from 0, 2# we can use the original words list as reverse mapping3words = ["hello", "world", "embedding"]4word_to_id = {word: i for i, word in enumerate(words)}5
6# words[id] gives us the word for that ID7test_id = 28print(f"ID {test_id} maps to word: '{words[test_id]}'")
Exercise 3: Handle Unknown Words
What happens if you try to look up a word not in the vocabulary? Add error handling.
Your task: Try to look up a word like "python" that's not in your vocabulary. Then add proper error handling to gracefully handle unknown words.
⨠Click to reveal solution
Solution:
1words = ["hello", "world", "embedding"]2word_to_id = {word: i for i, word in enumerate(words)}3
4def get_word_id(word, word_to_id_dict):5 """6 Get word ID with proper error handling7 """8 if word in word_to_id_dict:9 return word_to_id_dict[word]10 else:11 print(f"Warning: Word '{word}' not found in vocabulary!")12 return None13
14# Test with known words15print("Known word:", get_word_id("hello", word_to_id))16print("Known word:", get_word_id("world", word_to_id))17
18# Test with unknown word19print("Unknown word:", get_word_id("python", word_to_id))20
21# Alternative: Using try-except22def get_word_id_v2(word, word_to_id_dict):23 try:24 return word_to_id_dict[word]25 except KeyError:26 print(f"Error: '{word}' is not in vocabulary")27 return -1 # Return -1 for unknown words28
29print("Unknown word (v2):", get_word_id_v2("machine", word_to_id))30
31# Alternative: Using dict.get() with default32def get_word_id_v3(word, word_to_id_dict, unknown_id=-1):33 return word_to_id_dict.get(word, unknown_id)34
35print("Unknown word (v3):", get_word_id_v3("learning", word_to_id))
Output:
Known word: 0
Known word: 1
Warning: Word 'python' not found in vocabulary!
Unknown word: None
Error: 'machine' is not in vocabulary
Unknown word (v2): -1
Unknown word (v3): -1
Key Takeaways
ā
Environment Setup: uv manages Python and packages simply
ā
VS Code: Provides multiple ways to run Python code
ā
First Embedding: Words can be represented as unique numbers
ā
Foundation: This integer ID approach is the basis for all embeddings
Next Lesson Preview
In Lesson 2, we'll improve our embeddings by representing words as vectors instead of single numbers. This allows us to capture relationships between words!
Navigation