Extract data from mixed format

Advanced

Product descriptions often contain embedded codes in inconsistent formats. Your task is to extract product codes regardless of the label or position used.

You have a list of product descriptions in column A. Each description contains a product code in the format XXX-0000 (three letters, hyphen, four digits). The challenge: the code appears with different labels and in different positions.

The variations you'll encounter

The product codes use different labels:

  • SKU: followed by the code
  • COD: followed by the code
  • ITM: followed by the code
  • REF: followed by the code

The position varies - some codes appear in parentheses (), some in brackets [], and some inline with commas.

Key functions to use

  • SEARCH: Find the position of a text string (case-insensitive)
  • MID: Extract characters from a specific position
  • IFERROR: Handle errors when a search pattern isn't found

Your task

Fill column B with the extracted product codes for all 10 products (rows 2-11). Each result should be exactly 8 characters in the format ABC-1234.

Need some help?

Hint 1

Use SEARCH to find each label pattern. For example, SEARCH("SKU: ",A2) returns the position where "SKU: " starts.

Hint 2

Wrap SEARCH calls in IFERROR to try multiple patterns: IFERROR(SEARCH("SKU: ",A2), IFERROR(SEARCH("COD: ",A2), ...)) returns the first match found.

Hint 3

All labels are the same length, so you can add 5 and use MID to extract 8 characters.

Related function(s)

Exercise