Python Programming Blueprints
上QQ阅读APP看书,第一时间看更新

Implementing today's weather forecast

It's time to start adding the implementation of the _today_forecast method, but first, we need to import BeautifulSoup. At the top of the file, add the following import statement:

from bs4 import BeautifulSoup

Now, we can start adding the _today_forecast method:

def _today_forecast(self, args):
criteria = {
'today_nowcard-temp': 'div',
'today_nowcard-phrase': 'div',
'today_nowcard-hilo': 'div',
}

content = self._request.fetch_data(args.forecast_option.value,
args.area_code)

bs = BeautifulSoup(content, 'html.parser')

container = bs.find('section', class_='today_nowcard-container')

weather_conditions = self._parse(container, criteria)

if len(weather_conditions) < 1:
raise Exception('Could not parse weather foreecast for
today.')

weatherinfo = weather_conditions[0]

temp_regex = re.compile(('H\s+(\d+|\-{,2}).+'
'L\s+(\d+|\-{,2})'))
temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo'])
high_temp, low_temp = temp_info.groups()

side = container.find('div', class_='today_nowcard-sidecar')
humidity, wind = self._get_additional_info(side)

curr_temp = self._clear_str_number(weatherinfo['today_nowcard-
temp'])

self._unit_converter.dest_unit = args.unit

td_forecast = Forecast(self._unit_converter.convert(curr_temp),
humidity,
wind,
high_temp=self._unit_converter.convert(
high_temp),
low_temp=self._unit_converter.convert(
low_temp),
description=weatherinfo['today_nowcard-
phrase'])

return [td_forecast]

That is the function that will be called when the -td or --today flag is used on the command line. Let's break down this code so that we can easily understand what it does. Understanding this method is important because these methods parse data from other weather forecast options (five days, ten days, and weekend) that are very similar to this one.

The method's signature is quite simple; it only gets args, which is the Argument object that is created in the __main__ method. The first thing we do in this method is to create a criteria dictionary with all the DOM elements that we want to find in the markup:

criteria = {
'today_nowcard-temp': 'div',
'today_nowcard-phrase': 'div',
'today_nowcard-hilo': 'div',
}

As mentioned before, the key to the criteria dictionary is the name of the DOM element's CSS class, and the value is the type of the HTML element:

  • The today_nowcard-temp class is a CSS class of the DOM element containing the current temperature
  • The today_nowcard-phrase class is a CSS class of the DOM element containing weather conditions text (Cloudy, Sunny, and so on)
  • The today_nowcard-hilo class is the CSS class of the DOM element containing the highest and lowest temperature

Next, we are going to fetch, create, and use BeautifulSoup to parse the DOM:

content = self._request.fetch_data(args.forecast_option.value, 
args.area_code)

bs = BeautifulSoup(content, 'html.parser')

container = bs.find('section', class_='today_nowcard-container')

weather_conditions = self._parse(container, criteria)

if len(weather_conditions) < 1:
raise Exception('Could not parse weather forecast for today.')

weatherinfo = weather_conditions[0]

First, we make use of the fetch_data method of the Request class that we created on the core module and pass two arguments; the first is the forecast option and the second argument is the area code that we passed on the command line.
After fetching the data, we create a BeautifulSoup object passing the content and a parser. Since we are getting back HTML, we use html.parser.

Now is the time to start looking for the HTML elements that we are interested in. Remember, we need to find an element that will be a container, and the _parser function will search through the children elements and try to find items that we defined in the dictionary criteria. For today's weather forecast, the element that contains all the data we need is a section element with the today_nowcard-container CSS class.

BeautifulSoup contains the find method, which we can use to find elements in the HTML DOM with specific criteria. Note that the keyword argument is called class_ and not class because class is a reserved word in Python.

Now that we have the container element, we can pass it to the _parse method, which will return a list. We perform a check if the result list contains at least one element and raise an exception if it is empty. If it is not empty, we just get the first element and assign it to the weatherinfo variable. The weatherinfo variable now contains a dictionary with all the items that we were looking for.

The next step is split the highest and lowest temperature:

temp_regex = re.compile(('H\s+(\d+|\-{,2}).+'
'L\s+(\d+|\-{,2})'))
temp_info = temp_regex.search(weatherinfo['today_nowcard-hilo'])
high_temp, low_temp = temp_info.groups()

We want to parse the text that has been extracted from the DOM element with the today_nowcard-hilo CSS class, and the text should look something like H 50 L 60H -- L 60, and so on. An easy and simple way of extracting the text we want is to use a regular expression:

H\s+(\d+|\-{,2}).L\s+(\d+|\-{,2})

We can break this regular expression into two parts. First, we want to get the highest temperature—H\s+(\d+|\-{,2}); this means that it will match an H followed by some spaces, and then it will group a value that matches either numbers or a maximum of two dash symbols. After that, it will match any character. Lastly, comes the second part that basically does the same; however, it starts matching an L.

After executing the search method, it gets regular expression groups that have been returned calling the groups() function, which in this case will return two groups, one for the highest temperature and the second for the lowest.

Other information that we want to provide to our users is information about wind and humidity. The container element that contains this information has a CSS class called today_nowcard-sidecar:

side = container.find('div', class_='today_nowcard-sidecar')
wind, humidity = self._get_additional_info(side)

We just find the container and pass it into the _get_additional_info method that will loop through the children elements of the container, extracting the text and finally returning the results for us.

Finally, the last part of this method:

curr_temp = self._clear_str_number(weatherinfo['today_nowcard-temp'])

self._unit_converter.dest_unit = args.unit

td_forecast = Forecast(self._unit_converter.convert(curr_temp),
humidity,
wind,
high_temp=self._unit_converter.convert(
high_temp),
low_temp=self._unit_converter.convert(
low_temp),
description=weatherinfo['today_nowcard-
phrase'])

return [td_forecast]

Since the current temperature contains a special character (degree sign) that we don't want to have at this point, we use the _clr_str_number method to pass the today_nowcard-temp item of the weatherinfo dictionary.

Now that we have all the information we need, we construct the Forecast object and return it. Note that we are returning an array here; this is because all other options that we are going to implement (five-day, ten-day, and weekend forecasts) will return a list, so to make it consistent; also to facilitate when we will have to display this information on the terminal, we are also returning a list.

Another thing to note is that we are making use of the convert method of our UnitConverter to convert all the temperatures to the unit selected in the command line.

When running the command again:

$ python -m weatherterm -u Fahrenheit -a SWXX2372:1:SW -p WeatherComParser -td

You should see an output similar to this:

Congratulations! You have implemented your first web scraping application. Next up, let's add the other forecast options.