Preface xiii
About the Companion Website xvii
Introduction xix
1 Open-Source Tools for Data Science 1
1.1 R Language and RStudio 1
1.2 Python Language and Tools 5
1.3 Advanced Plain Text Editor 8
1.4 CSV Format for Datasets 8
2 Simple Exploratory Data Analysis 13
2.1 Missing Values Analysis 13
2.2 R: Descriptive Statistics and Utility Functions 15
2.3 Python: Descriptive Statistics and Utility Functions 17
3 Data Organization and First Data Frame Operations 23
3.1 R: Read CSV Datasets and Column Selection 24
3.2 R: Rename and Relocate Columns 36
3.3 R: Slicing, Column Creation, and Deletion 38
3.4 R: Separate and Unite Columns 45
3.5 R: Sorting Data Frames 49
3.6 R: Pipe 55
3.7 Python: Column Selection 59
3.8 Python: Rename and Relocate Columns 67
3.9 Python: NumPy Slicing, Selection with Index, Column Creation and Deletion 69
3.10 Python: Separate and Unite Columns 81
3.11 Python: Sorting Data Frame 85
4 Subsetting with Logical Conditions 99
4.1 Logical Operators 99
4.2 R: Row Selection 101
5 Operations on Dates, Strings, and Missing Values 127
5.1 R: Operations on Dates and Strings 129
5.2 R: Handling Missing Values and Data Type Transformations 141
5.3 R: Example with Dates, Strings, and Missing Values 154
5.4 Pyhton: Operations on Dates and Strings 165
5.5 Python: Handling Missing Values and Data Type Transformations 173
5.6 Python: Examples with Dates, Strings, and Missing Values 182
6 Pivoting and Wide-long Transformations 195
6.1 R: Pivoting 197
6.2 Python: Pivoting 202
7 Groups and Operations on Groups 221
7.1 R: Groups 222
7.2 Python: Groups 244
8 Conditions and Iterations 271
8.1 R: Conditions and Iterations 272
8.2 Python: Conditions and Iterations 284
9 Functions and Multicolumn Operations 307
9.1 R: User-defined Functions 308
9.2 R: Multicolumn Operations 316
9.3 Python: User-defined and Lambda Functions 330
10 Join Data Frames 347
10.1 Basic Concepts 348
10.2 Python: Join Operations 369
11 List/Dictionary Data Format 393
11.1 R: List Data Format 395
11.2 R: JSON Data Format and Use Cases 410
11.3 Python: Dictionary Data Format 422
Questions 443
Index 447