Getting the position of a tag in Beautiful Soup
BeautifulSoup gives us the following two positional information about a tag:
line number, which is accessed using the
starting index of the tag in the line, which is using the
Consider the following HTML document:
my_html = """<p>Alex is 5years old</p><p id="bob">Bob is <b>10</b> years old</p><p>Cathy is 15 years old</p>"""soup = BeautifulSoup(my_html, "html.parser")
To get the line number and the starting index of Bob's age tag,
Note the following:
We get a 3 for the line number instead of a 2 because we've added a line break after """ in the HTML document.
The starting index 25 means that there is a total of 25 characters that come before the
This only works when you're using either
"html5lib" for the parser.