my_html = """
       <div>
              <p>I like tea.</p>
              <p>I like <b>soup</b>.</p>
              I like soda.
       </div>
"""
soup = BeautifulSoup(my_html)

Extracting raw text

To extract all text:


        
        
            
                
                
                    print(soup.get_text())
                
            
               
              I like tea.
              I like soup.
              I like soda.

Notice how you end up with awkward structure due to the spacings.

Extracting stripped text

To solve the problem of awkward spacings, add the strip=True parameter:


        
        
            
                
                
                    print(soup.get_text(strip=True))
                
            
            I like tea.I likesoup.I like soda.

This looks much cleaner.

Specifying a separator

To join the bits and pieces of text using "**" as the separator:


        
        
            
                
                
                    print(soup.get_text("**", strip=True))
                
            
            I like tea.**I like**soup**.**I like soda.

To explain the output, recall that our HTML document's middle line was as follows:


        
        
            
                
                
                    <p>I like <b>soup</b>.</p>

Each pair of opening and closing tags are replaced by your specified separator - that's all.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!