[CS50's - Introduction to Python] Regular Expressions
In this post I will show how to solve the problems from topic 6 - Regular Expressions of CS50’s - Introduction to Python course.
Table of Contents
NUMB3RS
In this problem we must implement a function called validate
that expects an IPv4 address as input as a str and then returns True or False, respectively, if that input is a valid IPv4 address or not.
The tructure of numb3rs.py should be as follows, wherein we’re welcome to modify main and/or implement other functions as we see fit, but we may not import any other libraries. We’re welcome, but not required, to use re and/or sys.
import re
import sys
def main():
print(validate(input("IPv4 Address: ")))
def validate(ip):
...
...
if __name__ == "__main__":
main()
My solution for this problem is:
- Import re library to be able to use the search and the groups methods. A good place to test regular expressions before including it in the code is https://regexr.com/.
import re
def main():
print(validate(input("IPv4 Address: ")))
def validate(ip):
if matches := re.search(r"^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$", ip):
for matche in matches.groups():
matche = int(matche)
print(matche)
if matche < 0 or matche > 255:
return False
return True
return False
if __name__ == "__main__":
main()
After inplementing numb3rs.py
I implemented test_numb3rs.py
to test the validate
function as requested.
import numb3rs
def test_valid():
assert numb3rs.validate("127.0.0.1") == True
assert numb3rs.validate("255.255.255.255") == True
def test_invalid():
assert numb3rs.validate("1") == False
assert numb3rs.validate("1.2") == False
assert numb3rs.validate("1.2.3") == False
assert numb3rs.validate("1.512.512.512") == False
assert numb3rs.validate("1.2.512.512") == False
assert numb3rs.validate("1.2.3.1000") == False
assert numb3rs.validate("4000.3.2.1") == False
assert numb3rs.validate("4000.3000.2.1") == False
assert numb3rs.validate("4000.3000.2000.1") == False
assert numb3rs.validate("4000.3000.2000.1000") == False
assert numb3rs.validate("a.b.c.d") == False
assert numb3rs.validate("cat") == False
Watch on YouTube
The objective in this problem is to implement a function called parse
that expects a str of HTML as input, and then extracts any YouTube URL that’s the value of a src attribute of an iframe element therein, and returns its shorter, shareable youtu.be
equivalent as a str. Expect that any such URL will be in one of the formats below. We must assume that the value of src will be surrounded by double quotes. And assume that the input will contain no more than one such URL. If the input does not contain any such URL at all, return None.
- http://youtube.com/embed/xvFZjo5PgG0
- https://youtube.com/embed/xvFZjo5PgG0
- https://www.youtube.com/embed/xvFZjo5PgG0
The structure of watch.py must be as follows:
import re
import sys
def main():
print(parse(input("HTML: ")))
def parse(s):
...
...
if __name__ == "__main__":
main()
My solution for this problem was:
- Import re library to be able to use the search and the group methods. A good place to test regular expressions before including it in the code is https://regexr.com/.
import re
def main():
print(parse(input("HTML: ")))
def parse(s):
if matches := re.search(r"src=\"(.+?)\"", s):
if embURL := re.search(r"youtube.com/embed/(.+)", matches.group(1)):
return f"https://youtu.be/{embURL.group(1)}"
return "None"
if __name__ == "__main__":
main()
Working 9 to 5
In this problem we are asked to implement a function called convert
that expects a str in either of the 12-hour formats below and returns the corresponding str in 24-hour format (i.e., 9:00 to 17:00). Expect that AM and PM will be capitalized (with no periods therein) and that there will be a space before each. Assume that these times are representative of actual times, not necessarily 9:00 AM and 5:00 PM specifically.
- 9:00 AM to 5:00 PM
- 9 AM to 5 PM
Raise a ValueError instead if the input to convert is not in either of those formats or if either time is invalid (e.g., 12:60 AM, 13:00 PM, etc.). But do not assume that someone’s hours will start ante meridiem and end post meridiem; someone might work late and even long hours (e.g., 5:00 PM to 9:00 AM).
The structure for working.py shuld be as follows:
import re
import sys
def main():
print(convert(input("Hours: ")))
def convert(s):
...
...
if __name__ == "__main__":
main()
A solution for this problem can be:
- Import re library to be able to use the search and the group methods. A good place to test regular expressions before including it in the code is https://regexr.com/.
import re
def main():
print(convert(input("Hours: ")))
def convert(s):
if matches := re.search(r"^(([0-9][0-2]*):*([0-5][0-9])*) ([A-P]M) to (([0-9][0-2]*):*([0-5][0-9])*) ([A-P]M)$", s):
first_hour = time(matches.group(2), matches.group(3), matches.group(4))
second_hour = time(matches.group(6), matches.group(7), matches.group(8))
return f"{first_hour} to {second_hour}"
else:
raise ValueError
def time(hour, minutes, type):
hour = int(hour)
if minutes != None:
minutes = int(minutes)
else:
minutes = 0
if minutes > 59:
raise ValueError
if type == "PM" and hour < 12:
hour += 12
elif type == "AM" and hour == 12:
hour = 0
return f"{hour:02}:{minutes:02}"
if __name__ == "__main__":
main()
The program test_working.py
tests the convert
function as requested.
import pytest
import working
def test_hour_minute():
assert working.convert("9:00 AM to 5:00 PM") == "09:00 to 17:00"
assert working.convert("10:30 PM to 8:50 AM") == "22:30 to 08:50"
def test_onlyHour():
assert working.convert("9 AM to 5 PM") == "09:00 to 17:00"
assert working.convert("10 PM to 8 AM") == "22:00 to 08:00"
def test_valueError():
with pytest.raises(ValueError):
working.convert("9:60 AM to 5:60 PM")
with pytest.raises(ValueError):
working.convert("9 AM - 5 PM")
with pytest.raises(ValueError):
working.convert("09:00 AM - 17:00 PM")
Regular, um, Expressions
In this problem we must implement a function called count
that expects a line of text as input as a str and returns, as an int, the number of times that “um” appears in that text, case-insensitively, as a word unto itself, not as a substring of some other word. For instance, given text like hello, um, world, the function should return 1. Given text like yummy, though, the function should return 0.
The structure of um.py
should be as follows:
import re
import sys
def main():
print(count(input("Text: ")))
def count(s):
...
...
if __name__ == "__main__":
main()
My solution for this problem was:
- Import re library to be able to use the findall and the IGNORECASE methods. A good place to test regular expressions before including it in the code is https://regexr.com/.
import re
def main():
print(count(input("Text: ")))
def count(s):
return len(re.findall(r"\b\W*um\W*\b", s, re.IGNORECASE))
if __name__ == "__main__":
main()
To test count
the following program can be used:
import um
def test_words():
assert um.count("um") == 1
assert um.count("UM") == 1
assert um.count("yummi") == 0
assert um.count("ALBUM") == 0
def test_phrase():
assert um.count("Um, thanks for the album.") == 1
assert um.count("Um? Mum? Is this that album where, um, umm, the clumsy alums play drums?") == 2
assert um.count("This is so yummi") == 0
def test_expression():
assert um.count("um?") == 1
assert um.count("Hum!?") == 0
assert um.count("Um, thanks, um...") == 2
Response Validation
This problem asks to implement a program that prompts the user for an email address via input and then prints Valid or Invalid, respectively, if the input is a syntatically valid email address. You may not use re. And do not validate whether the email address’s domain name actually exists.
In this case we can use the libraries validator-collection or validators from PyPI.
First we need to install one of either libriries:
- validator-collection:
pip install validator-collection
- validators:
pip install validators
I decided to use validator, so my solution for this problem is as follows:
import validators
def main():
print(validation(input("Text: ")))
def validation(s):
if validators.email(s):
return "Valid"
else:
return "Invalid"
if __name__ == "__main__":
main()
This is all for week seven - Regular Expressions, set of problems of CS50’s Introduction to Programming with Python course.