0% found this document useful (0 votes)
8 views2 pages

PHP Text Auto Detect Code

The document outlines a process for extracting a name and address from a set of lines, prioritizing name candidates based on certain conditions and scores. If no suitable name candidates are found, it falls back to selecting the first or last line that does not contain digits. Finally, it sanitizes the extracted name, phone, address, and price for output.

Uploaded by

shuvo.adsns2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views2 pages

PHP Text Auto Detect Code

The document outlines a process for extracting a name and address from a set of lines, prioritizing name candidates based on certain conditions and scores. If no suitable name candidates are found, it falls back to selecting the first or last line that does not contain digits. Finally, it sanitizes the extracted name, phone, address, and price for output.

Uploaded by

shuvo.adsns2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

if ($is_namey && !

$has_addr_kw) {
$name_candidates[] = ['score' => 3, 'text' => $ln_clean, 'idx'
=> $idx];
} elseif ($is_namey) {
$name_candidates[] = ['score' => 2, 'text' => $ln_clean, 'idx'
=> $idx];
}
}

// pick name: prefer first or last high-score candidate


$name = '';
if (!empty($name_candidates)) {
// sort by score desc, then favor first or last position
usort($name_candidates, function($a,$b){
if ($a['score'] === $b['score']) return 0;
return ($a['score'] > $b['score']) ? -1 : 1;
});
// heuristic: take the first candidate; if multiple with same
score, earlier idx wins
$name = $name_candidates[0]['text'];
} else {
// fallback: first/last line without digits
$fallbacks = array_filter($lines_no_label, function($ln){
return !preg_match('/\d{2,}/', oe_bn_to_en_digits($ln));
});
if (!empty($fallbacks)) {
// prefer first
$name = array_values($fallbacks)[0];
} else {
$name = $lines_no_label[0] ?? '';
}
}

// ADDRESS = leftover lines (without pure phone/price/name)


$address_parts = [];
foreach ($lines as $idx => $ln) {
$ln_clean = trim($lines_no_label[$idx]);
if ($ln_clean === '' || $ln_clean === $name) continue;

// remove line if it is just phone or contains phone-only


$ln_digits = preg_replace('/\D+/', '',
oe_bn_to_en_digits($ln_clean));
if ($phone && $ln_digits && strpos($ln_digits, $phone) !== false)
continue;

// remove lines that are clearly only price


$maybe_price_only = preg_match('/^(.*(৳|টাকা|\btk\b|\btaka\b|price|
amount|total|cod).*)$/iu', $ln_clean);
$has_big_number = preg_match('/\b\d{2,}\b/u',
oe_bn_to_en_digits($ln_clean));
if ($maybe_price_only && $has_big_number) continue;

$address_parts[] = $ln_clean;
}
// reorder: keep lines with address keywords first
$with_kw = []; $without_kw = [];
foreach ($address_parts as $ap) {
$ap_en_lower = strtolower(oe_bn_to_en_digits($ap));
if (oe_has_address_keyword($ap_en_lower)) $with_kw[] = $ap; else
$without_kw[] = $ap;
}
$address = trim(implode(' ', array_merge($with_kw, $without_kw)));

// FINAL sanitize for output


$name = esc_html($name);
$phone = esc_html($phone);
$address = esc_html($address);
$price = esc_html((string)$price);

You might also like