What about this (see the explanatory :: comment):
@echo off
::This script assumes that the lines of the input file (provided as command line argument)
::do not contain any of the characters `^`, `!`, and `"`. The lines may be of different
::lengths, empty lines are ignored though.
::The script processes the input file in two phase:
::1. let us call this the analysis phase, which consists of the following steps:
:: * read the first line of the file, store the string and determine its length;
:: * read the second line, walk through all characters beginning from the left and from
:: the right side within the same loop, find the character indexes that point to the
:: first left-most and the last right-most character that do not equal the respective
:: ones in the string from the first line, and store the retreived indexes;
:: * read the remaining lines, and for each one, extract the prefix and the suffix that
:: is indicated by the respective stored indexes and compare them with the respective
:: prefix and suffix from the first line; if both are equal, exit with the loop here
:: and continue with the next line; otherwise, walk through all characters beginning
:: before the previous left-most and after the previous right-most character indexes
:: towards the respective ends of the string, find the character indexes that again
:: point to the first left-most and the last right-most character that do not equal
:: the respective ones in the string from the first line, and update the previously
:: stored indexes accordingly;
::2. let us call this the execution phase, which reads the input file again, extracts the
:: portion of each line that is indicated by the two computed indexes and returns it;
::The output is displayed in the console; to write it to a file, use redirection (`>`).
setlocal EnableDelayedExpansion
set "MIN=" & set "MAX=" & set /A "ROW=0"
for /F usebackq^ delims^=^ eol^= %%L in ("%~1") do (
set /A "ROW+=1" & set "STR=%%L"
if !ROW! equ 1 (
call :LENGTH LEN "%%L"
set "SAV=%%L"
) else if !ROW! equ 2 (
set /A "IDX=LEN-1"
for /L %%I in (0,1,!IDX!) do (
if not defined MIN (
if not "!STR:~%%I,1!"=="!SAV:~%%I,1!" set /A "MIN=%%I"
)
if not defined MAX (
set /A "IDX=%%I+1"
for %%J in (!IDX!) do (
if not "!STR:~-%%J,1!"=="!SAV:~-%%J,1!" set /A "MAX=1-%%J"
)
)
)
if not defined MIN set /A "MIN=LEN, MAX=-LEN"
) else (
set "NXT=#"
if !MIN! gtr 0 for %%I in (!MIN!) do if not "!STR:~,%%I!"=="!SAV:~,%%I!" set "NXT="
if !MAX! lss 0 for %%J in (!MAX!) do if not "!STR:~%%J!"=="!SAV:~%%J!" set "NXT="
if not defined NXT (
if !MAX! lss -!MIN! (set /A "IDX=1-MAX") else (set /A "IDX=MIN-1")
for /L %%I in (!IDX!,-1,0) do (
if %%I lss !MIN! (
if not "!STR:~%%I,1!"=="!SAV:~%%I,1!" set /A "MIN=%%I"
)
if -%%I geq !MAX! (
set /A "IDX=%%I+1"
for %%J in (!IDX!) do (
if not "!STR:~-%%J,1!"=="!SAV:~-%%J,1!" set /A "MAX=1-%%J"
)
)
)
)
)
)
if defined MAX if !MAX! equ 0 set "MAX=8192"
for /F "tokens=1,2" %%I in ("%MIN% %MAX%") do (
for /F usebackq^ delims^=^ eol^= %%L in ("%~1") do (
set "STR=%%L"
echo(!STR:~%%I,%%J!
)
)
endlocal
exit /B
:LENGTH <rtn_length> <val_string>
::Function to determine the length of a string.
::PARAMETERS:
:: <rtn_length> variable to receive the resulting string length;
:: <val_string> string value to determine the length of;
set "STR=%~2"
setlocal EnableDelayedExpansion
set /A "LEN=1"
if defined STR (
for %%C in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
if not "!STR:~%%C!"=="" set /A "LEN+=%%C" & set "STR=!STR:~%%C!"
)
) else set /A "LEN=0"
endlocal & set "%~1=%LEN%"
exit /B
This could maybe be improved further, depending also on the data:
- if the length of the first line is fixed, or the line lengths vary in a quite small range, you could avoid the
:LENGTH sub-routine call and use a constant value instead; if there is a known maximum length of the common prefix/suffix, the line length is even not needed at all;
- instead of reading the file twice (due to the two-pass algorithm), you could read it into memory at the beginning and use these data later; for huge files this might be a bad idea though;
- I used several
for /L loops to walk through certan character ranges, whose bodies are skipped by some if conditions due to lack of while loops or something like exit for; I could have left them using goto, but then I needed to put these loops in separate sub-routines to not break the outer loops; anyway, for [/L] loops finish iterating in the background even when broken by goto, although faster than executing the body; so together with the slow call and goto, I doublt that I would have gained much speed; depending on the data, pure goto loops could be more efficient as they can be left without any remaining background processing, but of course they also needed to be placed in their own sub-routines;
:breakin your code, which is a bad idea... better call them differently, so it is obvious where execution continues aftergoto. Anyway, the major problem in your code is that you are reading the input file multiple times, which makes it slow; alsogotoloops are quite slow... - aschipflRemove common prefix and/or suffix of unknown length from a list of stringsbe a title better describing your task? - user6811411