Here I analyse the US Social Security Baby Name catalogue, which reports the name given to male and female newborns for every year since 1880.
I need to load all the tables and concatenate them in a single data frame. To avoid confusing data from different years, we can prepare the individual data frames by adding a new column that specifies the year. To do it on the fly, directly from the output of “read CSV”, by chaining a method, we can use the data frame assign.
We managed to load the file in a one liner, so you can see that I’m going to use a comprehension to concatenate all the data frames.
This piece of code does several things. We loop over all the years between 1880 and 2018. We build up the file name using an f-string, and feed that into “read CSV”. We specify the column names, and we add the column that gives the correct year from the loop variable. Finally, we pass all the resulting data frames to pd concat, or pandas concat.
What if I want to look at the variance of the same name, like Claire?
Yearly top ten names: tracking the popularity of a name across years
Plotting a graph to analyse the change in popularity over time
All-time favourite baby names
Top ten unisex baby names
We’ll load our data set as usual. We need to compute the total number of boys and girls for a given name. This seems a good place to use group by, which lets us segment the data before applying an aggregation, in this case, the sum of the number of babies. So we use group by over sex and name, we select the number column and we take the sum. From this list with a multi-index, we can grab the males and females respectively, using dot lock. As you see, the two indices are going to be different. Nevertheless, we can combine the two series and pandas will align the indices for us. The results would be none where either series doesn’t have an element. For instance we check where the ratio between males and females is less than two. We can certainly get rid of those nones with drop in A. Now, remember the definition of unisex names as those with a ratio between .5 and two. This is a good expression for fancy indexing, and after we apply it, we see that 1660 names pass the test. Here, I’ve taken the index, because we don’t actually need the ratio itself, but just the names.
Spoonacular API is food and recipes API, allowing you to scrape web data from an online database with hundreds of thousands of recipes, products and ingredients. Using Python, I wrote a script that takes user input (what ingredients do you have? What are your dietary requirements?), sends a request to the API, and then returns a tailored list of recipes. The list also provides the user with calorie information about the different dishes. There are many other endpoints too, including the option to receive a random food joke, wine recommendations and recipes based on your carbohydrate limits.
Choosing a recipe
In my script, I ask users: what’s in your store cupboard? You can add ingredients like egg or tomato, and separate each one with a comma.
The Python programming language can scrape information from a web page (a HTML page). Here, I scrape the table element from the performance data page provided by the Associate of Real Estate funds. This index shows the performance of property funds across a quarterly basis; here I look at the quarter ending March 2020. Both the Index and the Property Fund Vision enables investors and their advisers to compare fund performance and other relevant data, to appropriate alternative funds, either individually or at an aggregated level. I turn the table data into soup in Python, meaning that Python can read the structure and then write it into a CSV file.