Beautiful Soup Tag | get_text method
Tag.get_text() method returns the text within the tag.
Consider the following HTML document:
my_html = """<div><p>I like tea.</p><p>I like <b>soup</b>.</p>I like soda.</div>"""soup = BeautifulSoup(my_html)
Extracting raw text
To extract all text:
I like tea.I like soup.I like soda.
Notice how you end up with awkward structure due to the spacings.
Extracting stripped text
To solve the problem of awkward spacings, add the
I like tea.I likesoup.I like soda.
This looks much cleaner.
Specifying a separator
To join the bits and pieces of text using
"**" as the separator:
I like tea.**I like**soup**.**I like soda.
To explain the output, recall that our HTML document's middle line was as follows:
<p>I like <b>soup</b>.</p>
Each pair of opening and closing tags are replaced by your specified separator - that's all.